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ETAPS Foreword 


Welcome to the proceedings of ETAPS 2018! After a somewhat coldish ETAPS 2017 
in Uppsala in the north, ETAPS this year took place in Thessaloniki, Greece. I am 
happy to announce that this is the first ETAPS with gold open access proceedings. This 
means that all papers are accessible by anyone for free. 

ETAPS 2018 was the 21st instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference established in 
1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. 
Each conference has its own Program Committee (PC) and its own Steering Com- 
mittee. The conferences cover various aspects of software systems, ranging from 
theoretical computer science to foundations to programming language developments, 
analysis tools, formal approaches to software engineering, and security. Organizing 
these conferences in a coherent, highly synchronized conference program facilitates 
participation in an exciting event, offering attendees the possibility to meet many 
researchers working in different directions in the field, and to easily attend talks of 
different conferences. Before and after the main conference, numerous satellite work- 
shops take place and attract many researchers from all over the globe. 

ETAPS 2018 received 479 submissions in total, 144 of which were accepted, 
yielding an overall acceptance rate of 30%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their peer reviewing efforts, the PC members for their 
contributions, and in particular the PC (co-)chairs for their hard work in running this 
entire intensive process. Last but not least, my congratulations to all authors of the 
accepted papers! 

ETAPS 2018 was enriched by the unifying invited speaker Martin Abadi (Google 
Brain, USA) and the conference-specific invited speakers (FASE) Pamela Zave (AT & 
T Labs, USA), (POST) Benjamin C. Pierce (University of Pennsylvania, USA), and 
(ESOP) Derek Dreyer (Max Planck Institute for Software Systems, Germany). Invited 
tutorials were provided by Armin Biere (Johannes Kepler University, Linz, Austria) on 
modern SAT solving and Fabio Somenzi (University of Colorado, Boulder, USA) on 
hardware verification. My sincere thanks to all these speakers for their inspiring and 
interesting talks! 

ETAPS 2018 took place in Thessaloniki, Greece, and was organised by the 
Department of Informatics of the Aristotle University of Thessaloniki. The university 
was founded in 1925 and currently has around 75,000 students; it is the largest uni- 
versity in Greece. ETAPS 2018 was further supported by the following associations 
and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer 
Science), EAPLS (European Association for Programming Languages and Systems), 
and EASST (European Association of Software Science and Technology). The local 
organization team consisted of Panagiotis Katsaros (general chair), Ioannis Stamelos, 
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Lefteris Angelis, George Rahonis, Nick Bassiliades, Alexander Chatzigeorgiou, Ezio 
Bartocci, Simon Bliudze, Emmanouela Stachtiari, Kyriakos Georgiadis, and Petros 
Stratis (EasyConferences). 

The overall planning for ETAPS is the main responsibility of the Steering Com- 
mittee, and in particular of its Executive Board. The ETAPS Steering Committee 
consists of an Executive Board and representatives of the individual ETAPS confer- 
ences, as well as representatives of EATCS, EAPLS, and EASST. The Executive 
Board consists of Gilles Barthe (Madrid), Holger Hermanns (Saarbriicken), Joost-Pieter 
Katoen (chair, Aachen and Twente), Gerald Liittgen (Bamberg), Vladimiro Sassone 
(Southampton), Tarmo Uustalu (Tallinn), and Lenore Zuck (Chicago). Other members 
of the Steering Committee are: Wil van der Aalst (Aachen), Parosh Abdulla (Uppsala), 
Amal Ahmed (Boston), Christel Baier (Dresden), Lujo Bauer (Pittsburgh), Dirk Beyer 
(Munich), Mikolaj Bojanczyk (Warsaw), Luis Caires (Lisbon), Jurriaan Hage 
(Utrecht), Rainer Hahnle (Darmstadt), Reiko Heckel (Leicester), Marieke Huisman 
(Twente), Panagiotis Katsaros (Thessaloniki), Ralf Kiisters (Stuttgart), Ugo Dal Lago 
(Bologna), Kim G. Larsen (Aalborg), Matteo Maffei (Vienna), Tiziana Margaria 
(Limerick), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), 
Andrew M. Pitts (Cambridge), Alessandra Russo (London), Dave Sands (Göteborg), 
Don Sannella (Edinburgh), Andy Schiirr (Darmstadt), Alex Simpson (Ljubljana), 
Gabriele Taentzer (Marburg), Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas 
Vojnar (Brno), and Lijun Zhang (Beijing). 

I would like to take this opportunity to thank all speakers, attendees, organizers 
of the satellite workshops, and Springer for their support. I hope you all enjoy the 
proceedings of ETAPS 2018. Finally, a big thanks to Panagiotis and his local orga- 
nization team for all their enormous efforts that led to a fantastic ETAPS in 
Thessaloniki! 


February 2018 Joost-Pieter Katoen 


Preface 


This book contains the proceedings of FASE 2018, the 21th International Conference 
on Fundamental Approaches to Software Engineering, held in Thessaloniki, Greece, in 
April 2018, as part of the annual European Joint Conferences on Theory and Practice of 
Software (ETAPS 2018). 

As usual for FASE, the contributions combine the development of conceptual and 
methodological advances with their formal foundations, tool support, and evaluation on 
realistic or pragmatic cases. As a result, the volume contains regular research papers 
that cover a wide range of topics, such as program and system analysis, model 
transformations, configuration and synthesis, graph modeling and transformation, 
software product lines, test selection, as well as learning and inference. We hope that 
the community will find this volume engaging and worth reading. 

The contributions included have been carefully selected. For the third time, FASE 
used a double-blind review process, as the past two years’ experiments were considered 
valuable by authors and worth the additional effort of anonymizing the papers. We 
received 77 abstract submissions from 24 different countries, from which 63 full-paper 
submissions materialized. All papers were reviewed by three experts in the field, and 
after intense discussion, only 19 were accepted, giving an acceptance rate of 30%. 

We thank the ETAPS 2018 general chair Katsaros Panagiotis, the ETAPS orga- 
nizers, Ioannis Stamelos, Lefteris Angelis, and George Rahonis, the ETAPS publicity 
chairs, Ezio Bartocci and Simon Bliudze, as well as the ETAPS SC chair, Joost-Pieter 
Katoen, for their support during the whole process. We thank all the authors for their 
hard work and willingness to contribute. Last but not least, we thank all the Program 
Committee members and external reviewers, who invested time and effort in the 
selection process to ensure the scientific quality of the program. 


February 2018 Alessandra Russo 
Andy Schiirr 


Program Committee 


Ruth Breu 
Yuanfang Cai 
Sagar Chaki 
Hana Chockler 
Ewen Denney 
Stefania Gnesi 
Dilian Gurov 
Zhenjiang Hu 
Reiner Hahnle 
Valerie Issarny 


Einar Broch Johnsen 


Gerti Kappel 
Ekkart Kindler 
Kim Mens 
Fernando Orejas 
Fabrizio Pastore 
Arend Rensink 
Leila Ribeiro 
Julia Rubin 
Bernhard Rumpe 
Alessandra Russo 
Rick Salay 

Ina Schaefer 
Andy Schirr 
Marjan Sirjani 
Wil Van der Aalst 
Daniel Varro 


Virginie Wiels 
Yingfei Xiong 
Didar Zowghi 


Organization 


Universitat Innsbruck, Austria 

Drexel University, USA 

Carnegie Mellon University, USA 

King’s College London, UK 

NASA Ames, USA 

ISTI-CNR, Italy 

Royal Institute of Technology (KTH), Sweden 

National Institute for Informatics, Japan 

Darmstadt University of Technology, Germany 

Inria, France 

University of Oslo, Norway 

Vienna University of Technology, Austria 

echnical University of Denmark, Denmark 

niversité catholique de Louvain, Belgium 

niversitat Politécnica de Catalunya, Spain 

niversity of Luxembourg, Luxembourg 

niversiteit Twente, The Netherlands 

niversidade Federal do Rio Grande do Sul, Brazil 

The University of British Columbia, USA 

RWTH Aachen, Germany 

Imperial College London, UK 

University of Toronto, Canada 

Technische Universitat Braunschweig, Germany 

Darmstadt University of Technology, Germany 

Reykjavik University, Iceland 

RWTH Aachen, Germany 

Budapest University of Technology and Economics, 
Hungary 

ONERA/DTIM, France 

Peking University, China 

University of Technology Sydney, Australia 


Cave ana 


G 


X Organization 
Additional Reviewers 


Adam, Kai 

Ahmed, Khaled E. 
Alrajeh, Dalal 
Auer, Florian 
Basile, Davide 
Bergmann, Gábor 
Bill, Robert 

Bubel, Richard 
Búr, Márton 

Chen, Yifan 
Cicchetti, Antonio 
de Vink, Erik 
Dulay, Naranker 
Feng, Qiong 
Guimaraes, Everton 
Haeusler, Martin 
Haglund, Jonas 
Haubrich, Olga 
Herda, Mihai 
Hillemacher, Steffen 
Huber, Michael 
Jafari, Ali 

Jiang, Jiajun 
Johansen, Christian 
Joosten, Sebastiaan 
Kamburjan, Eduard 
Kautz, Oliver 
Khamespanah, Ehsan 
Knüppel, Alexander 
Laurent, Nicolas 
Leblebici, Erhan 
Liang, Jingjing 
Lindner, Andreas 
Lity, Sascha 


Lochau, Malte 
Markthaler, Matthias 
Mauro, Jacopo 
Melgratti, Hernan 
Micskei, Zoltan 
Mohaqeqi, Morteza 
Mousavi, Mohamad 
Nesic, Damir 
Nieke, Michael 
Pun, Ka I. 

Saake, Gunter 
Sauerwein, Clemens 
Schlatte, Rudolf 
Schuster, Sven 
Seidl, Martina 
Semeráth, Oszkár 
Shaver, Chris 
Shumeiko, Igor 
Steffen, Martin 
Steinebach, Martin 
Steinhöfel, Dominic 
Stolz, Volker 

Tapia Tarifa, Silvia Lizeth 
Ter Beek, Maurice H. 
Tiezzi, Francesco 
Varga, Simon 
Wally, Bernhard 
Wang, Bo 
Weckesser, Markus 
Whiteside, Iain 
Wimmer, Manuel 
Wolny, Sabine 
Xiao, Lu 

Yue, Ruru 


Contents 


Model-Based Software Development 


A Formal Framework for Incremental Model Slicing................. 


Gabriele Taentzer, Timo Kehrer, Christopher Pietsch, and Udo Kelter 


Multiple Model Synchronization with Multiary Delta Lenses ........... 


Zinovy Diskin, Harald König, and Mark Lawford 


Controlling the Attack Surface of Object-Oriented Refactorings ......... 


Sebastian Ruland, Géza Kulcsar, Erhan Leblebici, Sven Peldszus, 
and Malte Lochau 


Effective Analysis of Attack Trees: A Model-Driven Approach.......... 


Rajesh Kumar, Stefano Schivo, Enno Ruijters, Bugra Mehmet Yildiz, 
David Huistra, Jacco Brandt, Arend Rensink, and Mariëlle Stoelinga 


Distributed Program and System Analysis 


ROLA: A New Distributed Transaction Protocol and Its Formal Analysis ... 


Si Liu, Peter Csaba Ölveczky, Keshav Santhanam, Qi Wang, 
Indranil Gupta, and José Meseguer 


A Process Network Model for Reactive Streaming Software 


with Deterministic Task Parallelism. .............. 00000000 eee eae 


Fotios Gioulekas, Peter Poplavko, Panagiotis Katsaros, 
Saddek Bensalem, and Pedro Palomo 


Distributed Graph Queries for Runtime Monitoring 


of Cyber-Physical Systems........... 0.0.0.0... 000002 eee ee eee 


Marton Bur, Gabor Szilagyi, Andras Vörös, 
and Daniel Varro 


EventHandler-Based Analysis Framework for Web Apps 


Using Dynamically Collected States ..............0..0.....0000. 


Joonyoung Park, Kwangwon Sun, and Sukyoung Ryu 


Software Design and Verification 


Hierarchical Specification and Verification of Architectural 


Design Patterns mai 4.968, etic era ohh Ap aa aa ee eee eer ae See 


Diego Marmsoler 


21 


38 


56 


77 


94 


XII Contents 


Supporting Verification-Driven Incremental Distributed Design 
Of Components:s4 sek Pare E ee eee eee eee Ba Se Mee sees wees 169 
Claudio Menghi, Paola Spoletini, Marsha Chechik, and Carlo Ghezzi 


Summarizing Software API Usage Examples 
Using Clustering Techniques ............. 0.0... .00 000000000005 189 
Nikolaos Katirtzis, Themistoklis Diamantopoulos, and Charles Sutton 


Fast Computation of Arbitrary Control Dependencies ................. 207 
Jean-Christophe Léchenet, Nikolai Kosmatov, and Pascale Le Gall 


Specification and Program Testing 


Iterative Generation of Diverse Models for Testing Specifications 
OF DSE TO0IS f .y edn eee ek oh oe ee gad sede eee ee aes 227 
Oszkar Semerath and Daniel Varro 


Optimising Spectrum Based Fault Localisation for Single Fault Programs 
Using Speciications: ss h.¢ ae hak ot Ros meee et a Fhe Ans eee 246 
David Landsberg, Youcheng Sun, and Daniel Kroening 


TCM: Test Case Mutation to Improve Crash Detection in Android ........ 264 
Yavuz Koroglu and Alper Sen 


CRETE: A Versatile Binary-Level Concolic Testing Framework.......... 281 
Bo Chen, Christopher Havlicek, Zhenkun Yang, Kai Cong, 
Raghudeep Kannavara, and Fei Xie 


Family-Based Software Development 


Abstract Family-Based Model Checking Using Modal Featured Transition 
Systems: Preservation of CTL* «ks. 65 24% cawae'ced vey ao dee eRe Ree 301 
Aleksandar S. Dimovski 


FPH: Efficient Non-commutativity Analysis of Feature-Based Systems ..... 319 
Marsha Chechik, Ioanna Stavropoulou, Cynthia Disenfeld, 
and Julia Rubin 


Taming Multi-Variability of Software Product Line Transformations ....... 337 
Daniel Strüber, Sven Peldzsus, and Jan Jürjens 


Author: dex: 25.2. Rh eds Oe eh a E Se en ad 357 


Model-Based Software Development 


® 


Check for 
updates 


A Formal Framework for Incremental 
Model Slicing 


Gabriele Taentzer!'®, Timo Kehrer?®, Christopher Pietsch? ®©, 
and Udo Kelter?® 


1 Philipps-Universitat Marburg, Marburg, Germany 
2 Humboldt-Universitat zu Berlin, Berlin, Germany 
3 University of Siegen, Siegen, Germany 
cpietsch@informatik.uni-siegen.de 


Abstract. Program slicing is a technique which can determine the sim- 
plest program possible that maintains the meaning of the original pro- 
gram w.r.t. a slicing criterion. The concept of slicing has been transferred 
to models, in particular to statecharts. In addition to the classical use 
cases of slicing adopted from the field of program understanding, model 
slicing is also motivated by specifying submodels of interest to be fur- 
ther processed more efficiently, thus dealing with scalability issues when 
working with very large models. Slices are often updated throughout spe- 
cific software development tasks. Such a slice update can be performed 
by creating the new slice from scratch or by incrementally updating the 
existing slice. In this paper, we present a formal framework for defining 
model slicers that support incremental slice updates. This framework 
abstracts from the behavior of concrete slicers as well as from the concrete 
model modification approach. It forms a guideline for defining incremen- 
tal model slicers independent of the underlying slicer’s semantics. Incre- 
mental slice updates are shown to be equivalent to non-incremental ones. 
Furthermore, we present a framework instantiation based on the concept 
of edit scripts defining application sequences of model transformation 
rules. We implemented two concrete model slicers for this instantiation 
based on the Eclipse Modeling Framework. 


1 Introduction 


Program slicing as introduced by Weiser [1] is a technique which determines 
those parts of a program (the slice) which may affect the values of a set of 
(user-)selected variables at a specific point (the slicing criterion). Since the sem- 
inal work of Weiser, which calculates a slice by utilizing static data and control 
flow analysis and which primarily focuses on assisting developers in debugging, 
a plethora of program slicing techniques addressing a broad range of use cases 
have been proposed [2]. 

With the advent of Model-Driven Engineering (MDE) [3], models rather than 
source code play the role of primary software development artifacts. Similar use 
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cases as known from program slicing must be supported for model slicing [4-6]. In 
addition to classical use cases adopted from the field of program understanding, 
model slicing is often motivated by scalability issues when working with very 
large models [7,8], which has often been mentioned as one of the biggest obstacles 
in applying MDE in practice [9,10]. Modeling frameworks such as the Eclipse 
Modeling Framework (EMF) and widely-used model management tools do not 
scale beyond a few tens of thousands of model elements [11], while large-scale 
industrial models are considerably larger [12]. As a consequence, such models 
cannot even be edited in standard model editors. Thus, the extraction of editable 
submodels from a larger model is the only viable solution to support an efficient 
yet independent editing of huge monolithic models [8]. Further example scenarios 
in which model slices may be constructed for the sake of efficiency include model 
checkers, test suite generators, etc., in order to reduce runtimes and memory 
consumption. 

Slice criteria are often modified during software development tasks. This 
leads to corresponding slice updates (also called slice adaptations in [8]). During 
a debugging session, e.g., the slicing criterion might need to be modified in order 
to closer inspect different debugging hypotheses. The independent editing of 
submodels is another example of this. Here, a slice created for an initial slicing 
criterion can turn out to be inappropriate, most typically because additional 
model elements are desired or because the slice is still too large. These slice 
update scenarios have in common that the original slicing criterion is modified 
and that the existing slice must be updated w.r.t. the new slicing criterion. 

Model slicing is faced with two challenging requirements which do not exist or 
which are of minor importance for traditional program slicers. First, the increas- 
ing importance and prevalence of domain-specific modeling languages (DSMLs) 
as well as a considerable number of different use cases lead to a huge number of 
different concrete slicers, examples will be presented in Sect. 2. Thus, methods 
for developing model slicers should abstract from a slicer’s concrete behavior 
(and thus from concrete modeling languages) as far as possible. Ideally, model 
slicers should be generic in the sense that the behavior of a slicer is adapt- 
able with moderate configuration effort [7]. Second, rather than creating a new 
slice from scratch for a modified slicing criterion, slices must often be updated 
incrementally. This is indispensable for all use cases where slices are edited by 
developers since otherwise these slice edits would be blindly overwritten [8]. In 
addition, incremental slice updating is a desirable feature when it is more effi- 
cient than creating the slice from scratch. To date, both requirements have been 
insufficiently addressed in the literature. 

In this paper, we present a fundamental methodology for developing model 
slicers which abstract from the behavior of a concrete slicer and which support 
incremental model slicing. To be independent of a concrete DSML and use cases, 
we restrict ourselves to static slicing in order to support both executable and 
non-executable models. We make the following contributions: 


1. A formal framework for incremental model slicing which can function as a 
guideline for defining adaptable and incremental model slicers (s. Sect. 3). 
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This framework is based on graph-based models and model modifications and 
abstracts from the behavior of concrete slicers as well as from the concrete 
model modification approach. Within this framework we show that incremen- 
tal slice updates are equivalent to non-incremental ones. 

2. An instantiation of this formal framework where incremental model slicers 
are specified by model patches. Two concrete model slicers. 


2 Motivating Example 


In this section we introduce a running example to illustrate two use cases of 
model slicing and to motivate incremental slice updates. 

Figure 1 shows an excerpt of the system model of the Barbados Car Crash 
Crisis Management System (bCMS) [13]. It describes the operations of a police 
and a fire department in case of a crisis situation. 


Police Fire 
-bemsSystem -bcmsSystem 
Car y: m bCMS System 5 Truck 
-position : Position -position : Position 
1. bpolicec. 1|-pscSystem 4|-fscSystem 
.."|-policeCar PSC System FSC System TE -fireTruck|1.. 
mangas interact ae -routeAgreement : Boolean -routeAgreement : Boolean 1 manage 
1 |-ps -ps|1 -closeAgreement : Boolean -closeAgreement : Boolean 1| -fs -fs |1 
PS coordinator -noMoreRoutesLeftToBeProposed : Boolean +reportReasonsTimeout() FS coordinator 
+receiveF ScoordinatorCall() treqComF SC() +comTimeout() +receivePScoordinatorCall() 
+receiveF ScoordinatorCredentials() | |[*callF Scoordinator() +reqComPSC() +receivePScoordinatorCredentials() 
+receiveF ScrisisDetails() +authF SCI +callPScoordinator() +receivePScrisisDetails() 
tonline() [+sendPScoordinatorCredentials(), +authPSC() online() 
+sendPSCrisisDetails() +sendF ScoordinatorCredentials() 
+crisisDetailsF SC() +sendF SCrisisDetails() 
+reportReasonsTimeout() +crisisDetailsPSC() 
+comTimeout() 
PSCSyste 


ExchangingCrisisDetails 


sendPSCrisisDetails {s201 ) 


Authorising 


1 
ẹ sendPScoordinatorCredentials, 


$.1.0.0 }_- 8.1.0.4 } 


léd 
N 
4 
E] 


5.1.1.0 LthFSCS 54.1.1 } sendPScoordinatorCredentials (s210 crisisDetailsFSC (s211 ) 


Fig. 1. Excerpt of the system model of the bCMS case study [13]. 


The system is modeled from different viewpoints. The class diagram mod- 
els the key entities and their relationships from a static point of view. A 
police station coordinator (PS coordinator) and a fire station coordinator (FS 
coordinator) are responsible for coordinating and synchronizing the activities 
on the police and fire station during a crisis. The interaction of both coordinators 
is managed by the respective system classes PSC System and FSC System which 
contain several operations for, e.g., establishing the communication between the 
coordinators and exchanging crisis details. The state machine diagram models 
the dynamic view of the class PSC Systen, i.e., its runtime behavior, for send- 
ing and receiving authorization credentials and crisis details to and from a FSC 
System. Initially, the PSC System is in the state Idle. The establishment of the 
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communication can be triggered by calling the operation callFScoordinator or 
reqComFSC. In the composite state Authorising the system waits for exchang- 
ing the credentials of the PS and FS coordinator by calling the operation 
sendPScoordinatorCredentials and authFSC, or vice versa. On entering the 
composite state ExchangingCrisisDetails, details can be sent by the opera- 
tion call sendPSCrisisDetails or details can be received by the operation call 
crisisDetailsFSC. 


Model Slicing. Model slicers are used to find parts of interest in a given model 
M. These parts of M are specified by a slicing criterion, which is basically a set 
of model elements or, more formally, a submodel C of M. A slicer extends C 
with further model elements of M according to the purpose of the slicer. 

We illustrate this with two use cases. Use case A is known as backward slicing 
in state-based models [4]. Given a set of states C in a statechart M as slicing 
criterion, the slicer determines all model elements which may have an effect 
on states in C. For instance, using $.1.0.1 (s. gray state in Fig.1) as slicing 
criterion, the slicer recursively determines all incoming transitions and their 
sources, e.g., the transition with the event sendPScoordinatorCredentials and 
its source state S.1.0.0, until an initial state is reached. 

The complete backward slice is indicated by the blue elements in the lower 
part of Fig. 1. The example shows that our general notion of a slicing criterion 
may be restricted by concrete model slicers. In this use case, the slicing criterion 
must not be an arbitrary submodel of a given larger model, but a very specific 
one, i.e., a set of states. 

Use case B is the extraction of editable models as presented in [8]. Here, 
the slicing criterion C is given by a set of requested model elements of M. The 
purpose of this slicer is to find a submodel which is editable and which includes 
all requested model elements. For example, if we use the blue elements in the 
lower part of Fig. 1 as slicing criterion, the model slice also contains the orange 
elements in the upper part of Fig. 1, namely three operations, because events of 
a transitions in a statechart represent operations in the class diagram, and the 
class containing these operations. 


Slice Update. The slicing criterion might be updated during a development 
task in order to obtain an updated slice. It is often desirable to update the 
slice rather than creating the new slice from scratch, e.g., because this is more 
efficient. Let us assume in use case A that the slicing criterion changes from 
S.1.0.1 to $.1.1.1. The resulting model slice only differs in the contained 
regions of the composite state Authorising. The upper region and its contained 
elements would be removed, while the lower region and its contained elements 
would be added. Next we could use the updated model slice from use case A as 
slicing criterion in use case B. In the related resulting model slice, the opera- 
tion sendPScoordinatorCredentials would then be replaced by the operation 
authFSc. 
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3 Formal Framework 


We have seen in the motivating example that model slicers can differ consider- 
ably in their intended purpose. The formal framework we present in the following 
defines the fundamental concepts for model slicing and slice updates. This frame- 
work uses graph-based models and model modifications [14]. It shall serve as a 
guideline how to define model slicers that support incremental slice updates. 


3.1 Models as Graphs 


Considering models, especially visual models, their concrete syntax is distin- 
guished from their abstract one. In Fig. 1, a UML model is shown in its concrete 
representation. In the following, we will reason about their underlying structure, 
i.e., their abstract syntax, which can be considered as graph. The abstract syntax 
of a modeling language is usually defined by a meta-model which contains the 
type information about nodes and edges as well as additional constraints. We 
assume that a meta-model is formalized by an attributed graph; model graphs 
are defined as attributed graphs being typed over the meta-model. This typing 
can be characterized by an attributed graph morphism [15]. In addition, graph 
constraints [16] may be used to specify additional requirements. Due to space 
limitations, we do not formalize constraints in this paper. 


Definition 1 (Typed model graph and morphism). Given two attributed 
graphs M and MM, called model and meta-model, the typed model (graph) of 
M is defined as MT =(M,type™) with type : M — MM being an attributed 
graph morphism, called typing morphism!. Given two typed models M and N, 
an attributed graph morphism f : M — N is called typed model morphism if 
typeN o f = type™. 


M Authorising:State |_____ type: M > MM MM 
7 S -Tth 


vi 


, Class StateMachine 


region woe — 
containe “Region container 77- ---- — gandeam y Ath class} stateMachine 
container ` container Ò PA / ownedOperation egion i 
- PSCSystemsClass+-" \ i ——'~»/Operation . _ Region 
container S + H containe| 
subvertex subvertex | subvertex} ¢ \ = ._|callEvent F: 
1 


:Pseudostate| |__ [5.14.0.0:State -1 [S.1.0.1: ee TERED OPEN 
h ransition source, 
source | transition|target | ‘source|transition | -> Transition taral Vertex 
:Transition :Transition A a y F 
a i 1 hi 
-I I sendPScoordi ~ 4 gate 
t. Maa :0) t [Pseudostate State 
:Trigger L 


Fig. 2. Excerpt of a typed model graph. 


Example 1 (Typed model graph). The left-hand side of Fig. 2 shows the model 
graph of an excerpt from the model depicted in Fig.1. The model graph is 


1 In the following, we usually omit the adjective “attributed”. 
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typed over the meta-model depicted on the right-hand side of Fig. 2. It shows a 
simplified excerpt of the UML meta-model. Every node (and edge) of the model 
graph is mapped onto a node or edge of the type graph by the graph morphism 
type: M — MM. 


Typed models and morphisms as defined above form the category AGraphs 47rcG 
in [15]. It has various properties since it is an adhesive HLR category using a class 
M of injective graph morphisms with isomorphic data mapping, it has pushouts 
and pullbacks where at least one morphism is in M. These constructions can 
be considered as generalized union and intersection of models being defined 
component-wise on nodes and edges such that they are structure-compatible. 
These constructions are used to define the formal framework. 


3.2 Model Modifications 


If we do not want to go into any details of model transformation approaches, 
the temporal change of models is roughly specified by model modifications. Each 
model modification describes the original model, an intermediate one after hav- 
ing performed all intended element deletions, and the resulting model after hav- 
ing performed all element additions. 


Definition 2 (Model modification). Given two models Mı and Mə, a 
(direct) model modification Mı => Mə is a span of injective morphisms 
Mı © M; 7> Mp. 


1. Two model modifications Mı Z= Mı —3 Mə and Mz 2 Mo3 m23 M3 


are concatenated to model modification Mı €= Miz Z3 M3 with (m13, M33) 
being the pullback of mi2 and Moz (intersecting Mig and Mp3). 

2. Given two direct model modifications m : M, “+ M, => M and p: P, 2 
P, 2> P2, p can be embedded into m, written e : p — m, if there are 
injective morphisms (also called embeddings) e, : Pi > Mi, es : Ps > Ms, 
and e> : Py > Mə with eı o pı = Mı 0 es and ez © pg = Mz O €s. 

3. A sequence Mo Mı ae Mn of direct model modifications is called 
model modification and is denoted by Mo => Mn. 

4. There are five special kinds of model modifications: 

(a) Model modification M fu qy '*% M is called identical. 

(b) Model modification Ø —— Ø —> 0 is called empty. 

(c) Model modification Ø — Ø — M is called model creation. 

(d) Model modification M — Ø — 9 is called model deletion. 

(e) Mz == M, “+ M; is called inverse modification to Mı “+ M, “> M3. 


In a direct model modification, model M, characterizes an intermediate 
model where all deletion actions have been performed but nothing has been 
added yet. To this end, Ms is the intersection of Mı and Mə. 
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Authorising:State 


container 


state 
container region container 


container]: Region |, container 
containe! ntainer 


Authorising 


subvertex subvertex _ transition subvertex| 
source 


S.1.0.0 na 5.1.0.1 :Psuedostate $.1.0.0:State;"" |: Transition |!2"9°t!s 4.0.1:State 
( s1.0.2 | sour transition) target source] target 
------— -C :Transition| ||: Transition t2"9°t|51.0.2:State |sourcd : Transition 


transition subvertex transition 


(a) model (b) model graph 


Fig. 3. Excerpt of a model modification 


Example 2 (Direct model modification). Figure 3 shows a model modification 
using our running example. While Fig. 3(a) focuses on the concrete model syn- 
tax, Fig.3(b) shows the changing abstract syntax graph. Figure3(a) depicts 
an excerpt of the composite state Authorising. The red transition is deleted 
while the green state and transitions are created. The model modification 
m: Mı “ M, “ Mz is illustrated in Fig.3(b). The red elements represent 
the set of nodes (and edges) Mı \ mı(Ms) to be deleted. The set Mz \ m2(M;) 
describing the nodes (and edges) to be created is illustrated by the green ele- 
ments. All other nodes (and edges) represent the intermediate model Ms. 


The double pushout approach to graph transformation [15] is a special kind 
of model modification: 


Definition 3 (Rule application). Given a model G and a model modification 


r:L& K5 R, called rule, with injective morphism m : L — G, called 
match, the rule application G =, m H is defined by the following two pushouts: 


Model H is constructed in two passes: (1) 


r n íi D:=G\m(L\1(K)), i.e., erase all model 

ii (PO1) (PO2) m elements that are to be deleted; (2) H := 

DUm'(R\r(K)) such that a new copy of 

G~ D > all model elements that are to be created is 
added. 


Note that the first pushout above exists if G\m(L\I()) does not yield dangling 
edges [15]. It is obvious that the result of a rule application G =>, H is a direct 


model modification G + D ais H. 


3.3 Model Slicing 


In general, a model slice is an interesting part of a model comprising a given 
slicing criterion. It is up to a concrete slicing definition to specify which model 
parts are of interest. 
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Definition 4 (Model slice). Given a model M and a slicing criterion C with 
a morphism c: C + M. A model slice S = Slice(M,c) is a model S such that 
there are two morphisms m : S —> M ande:C— S with moe=c. 


Note that each model slice S = Slice(M,c) induces a model modification 


ido e 
Cel CS. 
region R 
container ej. Regi container 
container |" egion [X container 
container] 
subvertex subvertex | subvertex| 


state 


state 


Authorising:State target 
target) |: Pseudostate [source 


: Transition 


|: Pseudostate | Idle: State $ :Pseudostate S.1.1.0:State $.1.1.1:State 
source |transition e ansi source| |source source| transition target| |source _|transition target 
:Transition :Transition 
. trigger} 
:Trigger 


callEvent 
sendPScoordinatorCredentials| |callFScoordinator 
:Operation :Operation 


ownedOperation ownedOperation| 


PSCSystem:Class 


reqComFSC authFSC |callEvent 
: Operation :Operation 
ownedOperation jownedOperation 


class class 


Fig. 4. Excerpt of two model slices 


Example 3 (Model slice). Figure4 depicts an excerpt of the model graph of 
M depicted in Fig.1 and the two slices Sback = Slice(M, Chack) and Sedit = 
Slice(M, Ceait). Sback is the backward slice as informally described in Sect. 2. 
Chack = {8.1.0.1} is the first slice criterion. The embedding Cpack(Coack) is rep- 
resented by the gray-filled element while embedding mback(Sback) is represented 
by the blue-bordered elements. Model epack (Crack) is illustrated by the gray-filled 
state having a blue border and Shack \ €back(Chack) by the green-filled elements 
having a blue border. 

Let Sback be the slicing criterion for the slice Seqit, ie. Cedit = Sdack and 
Cedit(Ceait) = Mback(Sback). Seait is the extracted editable submodel introduced 
in Sect. 2 by use case B. Its embedding megiz(Seaiz) is represented by the blue and 
orange-bordered elements. Model €eait(Ceait) is illustrated by the blue-bordered 
elements and Segit \ Ceait(Ceait) by the green-filled elements having an orange 
border. 


3.4 Incremental Slice Update 


Throughout working with a model slice, it might happen that the slice crite- 
rion has to be modified. The update of the corresponding model slice can be 
performed incrementally. Actually, modifying slice criteria can happen rather 
frequently in practice by, e.g., editing independent submodels of a large model 
in cooperative work. 
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Definition 5 (Slice update construction). Given a model slice Sı = 
Slice(M,C, — M) and a direct model modification c = Cy =) Cy 
slice S2 = Slice( M, C2 — M) can be constructed as follows: 


id Š 
1. Given slice Sı we deduce the model modification Ci na Cı —> Sı and take 


d 
its inverse modification: Sı = Cy — Ci. 
Then we take the given model modification c ae the slice criterion. 


And finally we take the model modification Co ua Cz > S, induced by slice 
So. 


All model modifications are concatenated yielding the direct model modification 
Sı 25 0, ZF S, called slice update construction (see also Fig. 6). 


Example 4 (Slice update example). Figure 5 illustrates a slice update construc- 
tion with Seait = Slice(M, Ceaiz > M) being the extracted submodel of our pre- 


vious example illustrated by the red-dashed box. The modification c : Cedit qau 


Cs et Ceait’ of the slicing criterion is depicted by the gray-filled elements. The 
red-bordered elements represent the set C's\Ceait(Ceait) of elements removed from 
the slicing criterion. The green-bordered elements form the set C's \ Ceait (Ceait: ) 
of elements added to the slicing criterion. Seaiz = Slice(M,Ceait — M) is 
the extracted submodel represented by the green-dashed box. Consequently, the 
slice is updated by deleting all elements in Seqit \ Cedit (Ceait(Cs)), represented by 
the red-bordered and red- and white-filled elements, and adding all elements in 
Sedit’ \Cedit’ (Ceait’(Cs)), represented by the green-bordered and green- and white- 
filled elements. Note that the white-filled elements are removed and added again. 
This motivated us to consider incremental slice updates defined below. 
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Fig. 5. Excerpt of an (incremental) slice update. 
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Definition 6 (Incremental slice update). Given M and Cı —> Mı as in 
Definition 4 as well as a direct model modification Cı  C, => C2, model 
slice Sı = Slice(M,C, — M) is incrementally updated to model slice S2 = 
Slice(M, Cz — M) yielding a direct model modification Sı = S, —* Sz, called 
incremental slice update from Sı to S2, with sı and sə being the pullback of 
mı : Sı > M and m2: S2 —> M (see also Fig. 6). 


Example 5 (Incremental slice update example). Given Seait and Seqi of our 
previous example. Furthermore, given the model modification Sedit et g po 
Seait’ whereby S, is isomorphic to the intersection of Seqit and Sea in M, 
Le. Ms : Ss > Medit( Sedit) N Meait/ (Seas) with Ms being an isomorphism due 
to the pullback construction. S, is illustrated by the elements contained in the 
intersection of the red- and green-dashed box in Fig. 5. In contrast to the slice 
update construction of the previous example the white-filled elements are not 
affected by the incremental slice update. 


Ideally, the slice update construction in 
Definition 5 should not yield a different 
update than the incremental one. However, 


c C: 
Cl < 1 Ge 2 > C2 


| 
ey e i e2 
1 this is not the case in general since the incre- 
S,;+—! f 52g, mental update keeps as many model ele- 
ments as possible in contrast to the update 
my ms, ma 


construction in Definition 5 In any case, 
M both update constructions should be com- 
ae patible with each other, i.e., should be in an 
embedding relation, as stated on the follow- 
Fig. 6. Incremental slice update ing proposition. 


Proposition 1 (Compatibility of slice update constructions). Given M 
and Cı as in Definition 4 as well as a direct model modification Cı <= C, > 
C2, the model modification resulting from the slice update construction in Def- 
inition 5 can be embedded into the incremental slice update from Sı to S2 (see 
also Fig. 6). 


Proof idea: Given an incremental slice update Sı Fins ee EEN So, it is the 
pullback of mı : Sı — M and mə : S2 — M. The slice update construction 
yields mı 0 e1 © cy = Mz 0 €2 © Cg. Due to pullback properties there is a unique 
embedding e : Cs — Ss with sı o e = e1 0c, and s20 e= ez o0 C2.” 


4 Instantiation of the Formal Framework 


In this section, we present an instantiation of our formal framework which is 
inspired by the model slicing tool introduced in [8]. The basic idea of the app- 
roach is to create and incrementally update model slices by calculating and 
applying a special form of model patches, introduced and referred to as edit 
script in [17]. 


? This proof idea can be elaborated to a full proof in a straight forward manner. 


A Formal Framework for Incremental Model Slicing 13 


4.1 Edit Scripts as Refinements of Model Modifications 


An edit script AM, >m, specifies how to transform a model M; into a model 
Mp2 in a stepwise manner. Technically, this is a data structure which comprises 
a set of rule applications, partially ordered by an acyclic dependency graph. Its 
nodes are rule applications and its edges are dependencies between them [17]. 
Models are represented as typed graphs as in Definition 1, rule applications 
are defined as in Definition 3. Hence, the semantics of an edit script is a set 
of rule application sequences taking all possible orderings of rule applications 
into account. Each sequence can be condensed into the application of one rule 
following the concurrent rule construction in, e.g., [15]. Hence, an edit script 
Aw,=M, induces a set of model modifications of the form Mı 1 M, Z> Mo. 

Given two models Mı and M> as well as a set R of transformation rules for 
this type of models, edit scripts are calculated in two basic steps [17]: 

First, the corresponding elements in Mı and Mp are calculated using a model 
matcher [18]. A basic requirement is that such a matching can be formally rep- 
resented as a (partial) injective morphism c : Mı — Mə. If so, the matching 


morphism c yields a unique model modification m : Mı = M, Z> Mo (up to 
isomorphism) with mz = c|m,. This means that Ms always has to be a graph. 

Second, an edit script is derived. Elementary model changes can be directly 
derived from a model matching; elements in Mı and Mə which are not involved 
in a correspondence can be considered as deleted and added, respectively [19]. 
The approach presented in [17] partitions the set of elementary changes such that 
each partition represents the application of a transformation rule of the given 
set R of transformation rules [20], and subsequently calculates the dependencies 
between these rule applications [17], yielding an edit script Am, = 4,. Sequences 
of rule applications of an edit script do not contain transient effects [17], i.e., 
pairs of change actions which cancel out each other (such as creating and later 
deleting one and the same element). Thus, no change actions are factored out 
by an edit script. 


4.2 Model Slicing Through Slice-Creating Edit Scripts 


Edit scripts are also used to construct new model slices. Given a model M and 
a slicing criterion C, a slice-creating edit script Ae=s is calculated which, when 
applied to the empty model e€, yields the resulting slice S. The basic idea to 
construct Aeg is to consider the model M as created by an edit script A, yy 
applied to the empty model e and to identify a sub-script of Acsm which (at 
least) creates all elements of C. The slice creating edit script A.s consists of 
the subgraph of the dependency graph of the model-creating edit script Acsm 
containing (i) all nodes which create at least one model element in C, and (ii) all 
required nodes and connecting edges according to the transitive closure of the 
“required” relation, which is implied by dependencies between rule applications. 

Since the construction of edit scripts depends on a given set R of transfor- 
mation rules, a basic applicability condition is that all possible models and all 
possible slices can be created by rules available in R. Given that this condition is 
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satisfied, model slicing through slice-creating edit scripts indeed behaves accord- 
ing to Definition 4, i.e., a slice S = Slice(M,C — M) is obtained by applying 
A, to the empty model: The resulting slice S is a submodel of M and a super- 
model of C. As we will see in Sect. 5, the behavior of a concrete model slicer and 
thus its intended purpose is configured by the transformation rule set R. 


4.3 Incremental Slicing Through Slice-Updating Edit Scripts 


To incrementally update a slice Sı = Slice(M,C, — M) to become slice Sy = 
Slice(M, C2 — M), we show that the approach presented in [8] constructs a 
slice-updating edit script Ag,=s5, which, if applied to the current slice S1, yields 
Sə in an incremental way. 

Similar to the construction of slice-creating edit scripts, the basic idea is to 
consider the model M as model-creating edit script Acem. The slice-updating 
edit script must delete all elements in the set S1 \ S2 from the current slice 5}, 
while adding all model elements in S2 \ S1. It is constructed as follows: Let Ps, 
and Ps, be the sets of rule applications which create all the elements in Sı and 
S2, respectively. Next, the sets P,em and Paqa of rule applications in A. jy are 
determined with Prem = Ps, \ Ps, and Pada = Ps, \ Ps,. Finally, the resulting 
edit script Ass, contains (1) the rule applications in set Paqa, with the same 
dependencies as in Aem, and (2) for each rule application in P,em, its inverse 
rule application with reversed dependencies as in Aem. By construction, there 
cannot be dependencies between rule applications in both sets, so they can be 
executed in arbitrary order. 

In addition to the completeness of the set R of transformation rules for a 
given modeling language (s. Sect.4.2), a second applicability condition is that, 
for each rule r in R, there must be an inverse rule r—! which reverts the effect 
of r. Given that these conditions are satisfied and a slice-updating edit script 
Ass, can be created, its application to Sı indeed behaves according to the 
incremental slice update as in Definition 6. This is so because, by construction, 
none of the model elements in the intersection of Sı and So in M is deleted by 
the edit script Ag,=5,. Consequently, none of the elements in the intersection 
of Cı and C2 in M, which is a subset of S1 So, is deleted. 


4.4 Implementation 


The framework instantiation has been implemented using a set of standard MDE 
technologies on top of the widely used Eclipse Modeling Framework (EMF), 
which employs an object-oriented implementation of graph-based models in 
which nodes and edges are represented as objects and references, respectively. 
Edit scripts are calculated using the model differencing framework SiLift [21], 
which uses EMF Compare [22] in order to determine the corresponding elements 
in a pair of models being compared with each other. A matching determined by 
EMF Compare fulfills the requirements presented in Sect. 4.1 since EMF Com- 
pare (a) delivers 1:1-correspondences between elements, thus yielding an injective 
mapping, and (b) implicitly matches edges if their respective source and target 
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nodes are matched and if they have the same type (because EMF does not sup- 
port parallel edges of the same type in general), thus yielding an edge-preserving 
mapping. Finally, transformation rules are implemented using the model trans- 
formation language and framework Henshin [23,24] which is based on graph 
transformation concepts. 


5 Solving the Motivating Examples 


In this section, we outline the configurations of two concrete model slicers which 
are based on the framework instantiation presented in Sect.4, and which are 
capable of solving the motivating examples introduced in Sect. 2. Each of these 
slicers is configured by a set of Henshin transformation rules which are used for 
the calculation of model-creating, and thus for the construction of slice-creating 
and slice-updating, edit scripts. The complete rule sets can be found at the 
accompanying website of this paper [25]. 


5.1 A State-Based Model Slicer 


Two of the creation rules which are used to configure a state-based model slicer 
as described in our first example of Sect.2 are shown in Fig.7. The rules are 
depicted in an integrated form: the left- and right-hand sides of a rule are merged 
into a unified model graph following the visual syntax of the Henshin model 
transformation language [23]. 


> Rule create StateWithTransition(tgt_Name, r, src) 


(> Rule createPseudostate(p:Pseudostate, r:Region) | 


«preserve» 


«preserve» r:Region «create» 
rRegion container 
«create» 
. «create» 
containe: 
seed | screate» tca subvertex 
Vi iner 
SUŞN ETEA haat «preserve» | «create» «create» | «create» 
«create» sre:Vertex_|<SOUrCe |: Transition |—target__stgt:state 
p:Pseudostate = name=tgt_Name 


Fig. 7. Subset of the creation rules for configuring a state-based model slicer 


Most of the creation rules 
are of a similar form as 
the creation rule createPseu- 
dostate, which simply creates 
a pseudostate and connects 
it with an existing container. 
The key idea of this slicer 
configuration, however, is the 
special creation rule creat- 
eStateWith Transition, which 
creates a state together with Fig. 8. Slice-creating edit script. 
an incoming transition in a 
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single step. To support the incremental updating of slices, for each creation 
rule an inverse deletion rule is included in the overall set of transformation rules. 
Parts of the resulting model-creating edit script using these rules are shown in 
Fig.8. For example, rule application p3 creates the state Idle in the top-level 
region of the state machine PSCSystem, together with an incoming transition 
having the initial state of the state machine, created by rule application p2, as 
source state. Thus, p3 depends on p2 since the initial state must be created first. 
Similar dependency relationships arise for the creation of other states which are 
created together with an incoming transition. 

The effect of this configuration on the behavior of the model slicer is as follows 
(illustrated here for the creation of a new slice): If state S.1.0.1 is selected as 
slicing criterion, as in our motivating example, rule application p7 is included 
in the slice-creating edit script since it creates that state. Implicitly, all rule 
applications on which p7 transitively depends on, i.e., all rule applications p1 
to p6, are also included in the slice-creating edit script. Consequently, the slice 
resulting from applying the slice-creating edit script to an empty model creates 
a submodel of the state machine of Fig. 1 which contains a transition path from 
its initial state to state S.1.0.1, according to the desired behavior of the slicer. 

A current limitation of our solution is that, for each state s of the slicing 
criterion, only a single transition path from the initial state to state s is sliced. 
This path is determined non-deterministically from the set of all possible paths 
from the initial state to state s. To overcome this limitation, rule schemes com- 
prising a kernel rule and a set of multi-rules (see, e.g., [26,27]) would have to 
be supported by our approach. Then, a rule scheme for creating a state with an 
arbitrary number of incoming transitions could be included in the configuration 
of our slicer, which in turn leads to the desired effect during model slicing. We 
leave such a support for rule schemes for future work. 


5.2 A Slicer for Extracting Editable Submodels 


In general, editable models adhere to a basic form of consistency which we assume 
to be defined by the effective meta-model of a given model editor [28]. The basic 
idea of configuring a model slicer for extracting editable submodels, adopted 
from [8], is that all creation and deletion rules preserve this level of consistency. 
Given an effective meta-model, such a rule set can be generated using the app- 
roach presented in [28] and its EMF-/UML-based implementation [29,30]. 

In our motivating example of Sect. 2, for instance, a consistency-preserving 
creation rule createTrigger creates an element of type Trigger and immediately 
connects it to an already existing operation of a class. The operation serves 
as the callEvent of this trigger and needs to be created first, which leads to 
a dependency in a model-creating edit script. Thus, if a trigger is included in 
the slicing criterion, the operation serving as callEvent of that trigger will be 
implicitly included in the resulting slice since it is created by the slice-creating 
edit script. 
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6 Related Work 


A large number of model slicers has been developed. Most of them work only 
with one specific type of models, notably state machines [4] and other types of 
behavioral models such as MATLAB/Simulink block diagrams [5]. Other sup- 
ported model types include UML class diagrams [31], architectural models [32] or 
system models defined using the SysML modeling language [33]. None of these 
approaches can be transferred to other (domain-specific) modeling languages, 
and they do not abstract from concrete slicing specifications. 

The only well-known more generally usable technique which is adaptable to 
a given modeling language and slicing specification is Kompren [7]. In contrast 
to our formal framework, however, Kompren does not abstract from the con- 
crete model modification approach and implementation technologies. It offers 
a domain-specific language based on the Kermeta model transformation lan- 
guage [34] to specify the behavior of a model slicer, and a generator which gen- 
erates a fully functioning model slicer from such a specification. When Kompren 
is used in the so-called active mode, slices are incrementally updated when the 
input model changes, according to the principle of incremental model transfor- 
mation [35]. In our approach, slices are incrementally updated when the slicing 
criterion is modified. As long as endogenous model transformations for con- 
structing slices are used only, Kompren could be easily extended to become an 
instantiation of our formal framework. 

Incremental slicing has also been addressed in [36], however, using a notion 
of incrementality which fundamentally differs from ours. The technique has been 
developed in the context of testing model-based delta-oriented software product 
lines [37]. Rather than incrementally updating an existing slice, the approach 
incrementally processes the product space of a product line, where each “product” 
is specified by a state machine model. As in software regression testing, the goal 
is to obtain retest information by utilizing differences between state machine 
slices obtained from different products. 

In a broader sense, related work can be found in the area of model splitting 
and model decomposition. The technique presented in [38] aims at splitting a 
model into submodels according to linguistic heuristics and using information 
retrieval techniques. The model decomposition approach presented in [39] consid- 
ers models as graphs and first determines strongly connected graph components 
from which the space of possible decompositions is derived in a second step. 
Both approaches are different from ours in that they produce a partitioning of 
an input model instead of a single slice. None of them supports the incremental 
updating of a model partitioning. 


7 Conclusion 


We presented a formal framework for defining model slicers that support incre- 
mental slice updates based on a general concept of model modifications. Incre- 
mental slice updates were shown to be equivalent to non-incremental ones. Fur- 
thermore, we presented a framework instantiation based on the concept of edit 
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scripts defining application sequences of model transformation rules. This instan- 
tiation was implemented by two concrete model slicers based on the Eclipse 
Modeling Framework and the model differencing framework SiLift. 

As future work, we plan to investigate incremental updates of both the under- 
lying model and the slicing criterion. It is also worthwhile to examine the extent 
to which further concrete model slicers fit into our formal framework of incre- 
mental model slicing. For our own instantiation of this framework, we plan to 
cover further model transformation features such as rule schemes and applica- 
tion conditions, which will make the configuration of concrete model slicers more 
flexible and enable us to support further use cases and purposes. 
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Abstract. Multiple (more than 2) model synchronization is ubiquitous 
and important for MDE, but its theoretical underpinning gained much 
less attention than the binary case. Specifically, the latter was extensively 
studied by the bx community in the framework of algebraic models for 
update propagation called lenses. Now we make a step to restore the bal- 
ance and propose a notion of multiary delta lens. Besides multiarity, our 
lenses feature reflective updates, when consistency restoration requires 
some amendment of the update that violated consistency. We emphasize 
the importance of various ways of lens composition for practical appli- 
cations of the framework, and prove several composition results. 


1 Introduction 


Modelling normally results in a set of inter-related models presenting different 
views of the system. If one of the models changes and their joint consistency 
is violated, the related models should also be changed to restore consistency. 
This task is obviously of paramount importance for MDE, but its theoretical 
underpinning is inherently difficult and reliable practical solutions are rare. There 
are working solutions for file synchronization in systems like Git, but they are 
not applicable in the UML/EMF world of diagrammatic models. For the latter, 
much work has been done for the binary case (synchronizing two models) by the 
bidirectional transformation community (bx) [15], specifically, in the framework 
of so called delta lenses [3], but the multiary case (the number of models to be 
synchronized is n > 2) gained much less attention—cf. the energetic call to the 
community in a recent Stevens’ paper [16]. 

The context underlying bx is model transformation, in which one model in 
the pair is considered as a transform of the other even though updates are prop- 
agated in both directions (so called round-tripping). Once we go beyond n = 2, 
we at once switch to a more general context of models inter-relations beyond 
model-to-model transformations. Such situations have been studied in the con- 
text of multiview system consistency, but rarely in the context of an accurate 
formal basis for update propagation. The present paper can be seen as an adap- 
tation of the (delta) lens-based update propagation framework for the multiview 
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consistency problem. We will call it multi-directional update propagation or mz 
following the bx-pattern. Our contributions to mx are as follows. 

We show with a simple example (Sect. 2) an important special feature of mx: 
consistency restoration may require not only update propagation to other mod- 
els but the very update created inconsistency should itself be amended (even 
for the case of a two-view system!); thus, update propagation should, in general, 
be reflective. Moreover, if even consistency can be restored without a reflective 
amendment, there are cases when such reflection is still reasonable. It means 
that Hippocraticness [15|—a major requirement for the classical bx, may have 
less weight in the mx world. In Sect.3, we provide a formal definition of multi- 
ary (symmetric) lenses with reflection, and define (Sect. 4) several operations of 
such lens composition producing complex lenses from simple ones. Specifically, 
we show how n-ary lenses can be composed from n-tuples of asymmetric binary 
lenses (Theorems 1 and 2), thus giving a partial solution to the challenging issue 
of building mx synchronization via bx discussed by Stevens in [16]. We consider 
lens composition results important for practical application of the framework. If 
the tool builder has implemented a library of elementary synchronization mod- 
ules based on lenses and, hence, ensuring basic laws for change propagation, then 
a complex module assembled from elementary lenses will automatically be a lens 
and thus also enjoys the basic laws. 


2 Example 


We will consider a simple example motivating our framework. Many formal con- 
structs below will be illustrated with the example (or its fragments) and referred 
to as Running example. 


from: Addr 
to: Addr 


empier [Company Person |, ompree 
name:Str name:Str name:Str 
lives: Addr located: Add 


M, M, 


M; 


Fig. 1. Multi-metamodel in UML 


2.1 A Multimodel to Play With 


Suppose two data sources, whose schemas (we say metamodels) are shown in 
Fig. 1 as class diagrams Mı and Mə that record employment. The first source is 
interested in employment of people living in downtown, the second one is focused 
on software companies and their recently graduated employees. In general, pop- 
ulation of classes Person and Company in the two sources can be different — they 
can even be disjoint, but if a recently graduated downtowner works for a software 
company, her appearance in both databases is very likely. Now suppose there is 
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an agency investigating traffic problems, which maintains its own data on com- 
muting between addresses (see schema M3) computable by an obvious relational 
join over Mı and Mg. In addition, the agency supervises consistency of the two 
sources and requires that if they both know a person p and a company c, then 
they must agree on the employment record (p,c): it is either stored by both or 
by neither of the sources. For this synchronization, it is assumed that persons 
and companies are globally identified by their names. Thus, a triple of data sets 
(we will say models) A,, Az, A3, instantiating the respective metamodels, can 
be either consistent (if the constraints described above are satisfied) or inconsis- 
tent (if they aren’t). In the latter case, we normally want to change some or all 
models to restore consistency. We will call a collection of models to be kept in 
sync a multimodel. 

To talk about constraints for multimodels, we need an accurate notation. 
If A is a model instantiating metamodel M and X is a class in M, we write 
X^ for the set of objects instantiating X in A. Similarly, if r: X, © Xə is 
an association in M, we write r4 for the corresponding binary relation over 
X A x x . For example, Fig. 2 presents a simple model A, instantiating Mı with 
Person“! = {p1, pi}, Company“! = {cı}, empl-er“! = {(p;,c1)}, and similarly 
for attributes, e.g., 


lives“? = {(p1,a1), (p,,a1)} C Person“? x Addr 


(lives? and also name“ are assumed to be functions and Addr is the (model- 
independent) set of all possible addresses). The triple (A1, A2, A3) is a (state of a) 
multimodel over the multimetamodel (Mı, M2, M3), and we say it is consistent if 
the two constraints specified below are satisfied. Constraint (C1) specifies mutual 
consistency of models A; and Ap in the sense described above; constraint (C2) 
specifies consistency between the agency’s view of data and the two data sources: 


if p € Person“! N Person“? and c € Company“! N Company“? 


(Cl) then (p,c) € empl-er4" iff (c, p) € empl-ee“? 


= 
(C2) (lives ) xl (empl-er U (empl-ee“#) ') M located“? C Commute“: 


where ~! refers to the inverse relations and M denotes relational join (composi- 
tion); using subsetting rather than equality in (C2) assumes that there are other 
data sources the agency can use. Note that constraint (C1) inter-relates two 
component models of the multimodel, while (C2) involves all three components 
and forces synchronization to be 3-ary. 

It is easy to see that multimodel A; 23 in Fig.2 is “two-times” inconsis- 
tent: (C1) is violated as both A; and Az know Mary and IBM, and (IBM, 
Mary) € empl-ee“? but (Mary, IBM) ¢ empler4:; (C2) is violated as A; and A2 
show a commuting pair (al, a15) not recorded in A3. We will discuss consis- 
tency restoration in the next subsection, but first we need to discuss an impor- 
tant part of the multimodel — traceability or correspondence mappings — held 
implicit so far. 
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a 
| 
| 


:empl-er 


p1:Person c,:Company 


name =John 
lives = a1 


name=|IBM 


name = Mary 


C2:Company 
name:Google 
located = a10 


| 
p,’:Person 
| 
| 
| 


| 

| 
name=|IBM 
located=a15 | 


Fig. 2. A(n inconsistent) multimodel At over the multi-metamodel in Fig. 1 


lives = at A, 


Indeed, classes Person“! and Person“? are interrelated by a correspon- 
dence relation linking persons with the same name, and similarly for Company. 
These correspondence links (we will write corr-links) may be implicit as they can 
always be restored. More important is to maintain corr-links between Commute“? 
and empl-er4! Uempl-ee“?. Indeed, class Commute together with its two attributes 
can be seen as a relation, and this relation can be instantiated by a multirelation 
as people living at the same address can work for companies located at the same 
address. If some of such Commute-objects is deleted, and this delete is to be prop- 
agated to models A, 2, we need corr-links to know which employment links are to 
be deleted. Hence, it makes sense to establish such links when objects are added to 
Commute*®, and use them later for deletion propagation. 

Importantly, for given models A, 9,3, there may be several different correspon- 
dence mappings: the same Commute-object can correspond to different commute- 
links over A; and Ag. In fact, multiplicity of possible corr-specifications is a 
general story that can only be avoided if absolutely reliable keys are available, 
e.g., if we suppose that persons and companies can always be uniquely identified 
by names, then corrs between these classes are unique. But if keys (e.g., per- 
son names) are not absolutely reliable, we need a separate procedure of model 
matching or alignment that has to establish whether objects p} € Person“? and 
py € Person“? both named Mary represent the same real world object. Con- 
straints we declared above implicitly involve corr-links, e.g., formula for (C1) 
is a syntactic sugar for the following formal statement: if there are corr-links 
p = (pi, p2) and c = (c1,¢2) with p; € Person‘, c; € Company“ (i = 1,2) then 
the following holds: (p;,c,) € empl-er^! iff (c2, p2) € empl-ee“?. A precise formal 
account of this discussion can be found in [10]. 

Thus, a multimodel is actually a tuple A = (A1, A2, A3, R) where R is a col- 
lection of correspondence relations over sets involved. This R is implicit in Fig. 2 
since in this very special case it can be restored. Consistency of a multimodel is 
a property of the entire 4-tuple A rather than its 3-tuple carrier (A1, A2, A3). 


2.2 Synchronization via Update Propagation 


There are several ways to restore consistency of the multimodel in Fig. 2 w.r.t. con- 
straint (C1). We may delete Mary from Aj, or delete its employment with IBM 
from Ag, or even delete IBM from Az. We can also change Mary’s employment 
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from IBM to Google, which will restore (C1) as A; does not know Google. Simi- 
larly, we can delete John’s record from A; and then Mary’s employment with IBM 
in A would not violate (C1). As the number of constraints and the elements they 
involve increase, the number of consistency restoration variants grows fast. 

The range of possibilities can be essentially decreased if we take into account the 
history of creating inconsistency and consider not only an inconsistent state At but 
update u: A — At that created it (assuming that A is consistent). For example, 
suppose that initially model A; contained record (Mary, IBM) (and A; contained 
(al, al5)-commute), and the inconsistency appears after Mary’s employment with 
IBM was deleted in A;. Then it’s reasonable to restore consistency by deleting this 
employment record in Az too; we say that deletion was propagated from A, to A2 
(where we assume that initially As contained the commute (al, al5)). If the incon- 
sistency appears after adding (IBM, Mary)-employment to Ag, then it’s reasonable 
to restore consistency by adding such a record to A;. Although propagating dele- 
tions /additions to deletions/additions is typical, there are non-monotonic cases 
too. Let us assume that Mary and John are spouses (they live at the same address), 
and that IBM follows an exotic policy prohibiting spouses to work together. Then 
we can interpret addition of (IBM, Mary)-record to Ag as swapping of the family 
member working for IBM, and then (John, IBM) is to be deleted from Aj. 

Now let’s consider how updates to and from model A3 may be propagated. 
As mentioned above, traceability /correspondence links play a crucial role here. 
If additions to A; or Az or both create a new commute, the latter has to be 
added to Ag (together with its corr-links) due to constraint (C2). In contrast, if 
a new commute is added to A3, we change nothing in A; 2 as (C2) only requires 
inclusion. If a commute is deleted from A3, and it is traced to a correspond- 
ing employment in empl-er4! U empl-ee“?, then this employment is deleted. (Of 
course, there are other ways to remove a commute derivable over A; and Ag.) 
Finally, if a commute-generating employment in empl-er4! Uempl-ee“? is deleted, 
the respective commute in A3 is deleted too. Clearly, many of the propagation 
policies above although formally correct, may contradict the real world changes 
and hence should be corrected, but this is a common problem of a majority of 
automatic synchronization approaches, which have to make guesses in order to 
resolve non-determinism inherent in consistency restoration. 


2.3 Reflective Update Propagation 


An important feature of update propagation scenarios above is that consistency 
could be restored without changing the model whose update caused inconsis- 
tency. However, this is not always desirable. Suppose again that violation of 
constraint (C1) in multimodel in Fig.2 was caused by adding a new person 
Mary to Aj, e.g., as a result of Mary’s moving to downtown. Now both models 
know both Mary and IBM, and thus either employment record (Mary, IBM) is to 
be added to Aj, or record (IBM, Mary) is to be removed from Ag. Either of the 
variants is possible, but in our context, adding (Mary, IBM) to A; seems more 
likely and less specific than deletion (IBM, Mary) from Ag. Indeed, if Mary has 
just moved to downtown, the data source A; simply may not have completed 
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her record yet. Deletion (IBM, Mary) from Az seems to be a different event 
unless there are strong causal dependencies between moving to downtown and 
working for IBM. Thus, an update policy that would keep Az unchanged but 
amend addition of Mary to A; with further automatic adding her employment 
for IBM (as per model A2) seems reasonable. This means that updates can be 
reflectively propagated (we also say self-propagated). 

Of course, self-propagation does not necessarily mean non-propagation to 
other directions. Consider the following case: model A, initially only contains 
(John, IBM) record and is consistent with Az shown in Fig. 2. Then record (Mary, 
Google) was added to A;, which thus became inconsistent with A2. To restore 
consistency, (Mary, Google) is to be added to Ag (the update is propagated 
from A; to A2) and (Mary, IBM) is to be added to A; as discussed above (i.e., 
addition of (Mary, Google) is amended or self-propagated). 

A general schema of update propa- 
gation including reflection is shown in 
Fig. 3. We begin with a consistent multi- 
model (Aj...An, R)! one of which mem- 
bers is updated u;: A; — A‘. The 
propagation operation, based on a priori 
defined propagation policies as sketched 
above, produces: 


(a) updates on all other models u}: Aj > 


ma a Aj, ls gj Fin; 
Y pee FBLA (b) a reflective update ui: Ai > A’; 


Moe MW . . 
ae 1 (c) a new correspondence specification 
R” such that the updated multimodel 
Fig. 3. Update propagation pattern (AY...A”, R”) is consistent. 


To distinguish given data from those produced by the operation, the former 
are shown with framed nodes and solid lines in Fig.3 while the latter are non- 
framed and dashed. Below we introduce an algebraic model encompassing several 
operations and algebraic laws formally modelling situations considered so far. 


3 Multidirectional Update Propagation and Delta Lenses 


A delta-based mathematical model for bx is well-known under the name of delta 
lenses; below we will say just lens. There are two main variants: asymmetric 
lenses, when one model is a view of the other and hence does not have any private 
information, and symmetric lenses, when both sides have their private data not 
visible on the other side [2,3,6]. In this section we will develop a framework for 
generalizing the idea for any n > 2 and including reflective updates. 


1 Here we first abbreviate (A1,..., An) by (A1..-An), and then write (A1...An, R) for 
((A1...An), R). We will apply this style in other similar cases, and write, e.g., i € 1...n 
for i € {1,...,n} (this will also be written as i < n). 
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3.1 Background: Graphs and Categories 


We reproduce well-known definitions to fix our notation. A (directed multi-) graph 
G consists of a set G° of nodes and a set G of arrows equipped with two 
functions s,t: G” — G° that give arrow a its source s(a) and target t(a) nodes. 
We write a: N — N’ if s(a) = N and t(a) = N’, anda: N> _ ora: _ > N’ 
if only one of this conditions is given. Correspondingly, expressions G™ (N, N’), 
G” (N, _), G™(_,N’) denote sets of, resp., all arrows from N to N’, all arrows 
from N, and all arrows into N”. 

A (small) category is a graph, whose arrows are associatively composable 
and every node has a special identity loop, which is the unit of the composition. 
In more detail, given two consecutive arrows aj: _ — N and ag: N > _, we 
denote the composed arrow by a,;a2. The identity loop of node N is denoted 
by idy, and equations aj;idy = a; and idy;a2 = az are to hold. A functor is 
a mapping of nodes and arrows from one category to another, which respects 
sources and targets. Having a tuple of categories (A,...A,), their product is a 
category A, x...x A, whose objects are tuples (Aj...An) € AY x...x A®, and 
arrows from (A1...An) to (A4...A/,) are tuples of arrows (u1...un) with uj: A; > 
Al, for all i € 1...n. 


? 


3.2 Model Spaces and Correspondences 


Basically, a model space is a category, whose nodes are called model states or just 
models, and arrows are (directed) deltas or updates. For an arrow u: A > A’, 
we treat A as the state of the model before update u, A’ as the state after the 
update, and u as an update specification. Structurally, it is a specification of 
correspondences between A and A’. Operationally, it is an edit sequence (edit 
log) that changed A to A’. The formalism does not prescribe what updates are, 
but assumes that they form a category, i.e., there may be different updates from 
state A to state A’; updates are composable; and idle updates id4: A — A (doing 
nothing) are the units of the composition. 

In addition, we require every model space A to be endowed with a family 
(K®”) 4cae of binary relations Ky” C A™(_,A) x A” (A, _) indexed by objects 
of A, and specifying non-conflicting or compatible consecutive updates. Intu- 
itively, an update u into A is compatible with update u’ from A, if u’ does 
not revert/undo anything done by u, e.g., it does not delete/create objects cre- 
ated/deleted by u, or re-modify attributes modified by u (see [14] for a detailed 
discussion). Formally, we only require (u,id4)€K%” and (id4,u’)eK? for all 
AEA’, ucA”(_,A) and u/c A”(A,_). 


Definition 1 (Model spaces). A model space is a pair A = (|A|, K”) with 
|A| a category (the carrier) of models and updates and K®” a family as specified 
above. A model space functor from A to B is a functor F : |A| — |B|, such 
that (u,u’) EK” implies (F(u), F(u’)) € KR”. We will denote model spaces and 
their carriers by the same symbol and often omit explicit mentioning of K”. 
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In the sequel, we will work with families of model spaces indexed by a finite 
set I, whose elements can be seen as space names. To simplify notation, we 
will assume that J = {1,...,n} although ordering will not play any role in our 
framework. Given a tuple of model spaces Aj,...,An, we will refer to objects 
and arrows of the product category A, x---x An as model tuples and update 
tuples or, sometimes, as discrete multimodels/multiupdates. 


Definition 2 (Multispace/Multimodels). Let n > 2 be a natural number. 


(i) An n-ary multimodel space or just an n-ary multispace A is given by a 
family of model spaces OA = (Ai,...,An) called the boundary of A, and a 
set A® of elements called corrs along with a family of functions (ði: A* > 
A®)i<n providing every corr R with its boundary OR = (0,R...0,R), ie., 
a tuple of models taken from the multispace boundary one model per space. 
Intuitively, a corr is understood as a consistent correspondence specifica- 
tion interrelating models from its boundary (and for this paper, all corrs are 
assumed consistent). 

Given a model tuple (A...An), we write AX(Aj...An) for the set of all corrs 
R with OR = (A1...An); we call models A; feet of R. Respectively, spaces A; 
are feet of A and we write OA for Aj. 

(ii) An (aligned consistent) multimodel over a multispace A is a model tuple 
(Ay...An) along with a corr R € A*®(Aj...An) relating the models. A 
multimodel update u: (A1...An, R) —> (Aj...A1,,R’) is a tuple of updates 
(uy: Ay > Al, ..., Unt An > Al). 


Note that any corr R uniquely defines a multimodel via the corr’s boundary 


function 0. We will also need to identify the set of all corrs for some fixed A € A? 


for a given i: AX(A,_ ) def { | Re A* } 3R = A. 


The Running example of Sect.2 gives rise to a 3-ary multimodel space. For 
i < 3, space A; consists of all models instantiating metamodel M; in Fig. 1 
and their updates. To get a consistent multimodel (A; A2A3, R) from that one 
shown in Fig.2, we can add to A; an empler-link connecting Mary to IBM, 
add to A3 a commute with from = al and to = a15, and form a corr-set R = 
{(p1: p2), (c1, ch) } (all other corr-links are derivable from this data). 


3.3 Update Propagation and Multiary (Delta) Lenses 


Update policies described in Sect. 2 can be extended to cover propagation of all 
updates u;, i € 1...3 according to the pattern in Fig. 3. This is a non-trivial task, 
but after it is accomplished, we have the following synchronization structure. 


Definition 3 (Symmetric lenses). An n-ary symmetric lens is a pair £ = 
(A, ppg) with A an n-ary multispace called the carrier of £, and (ppg,)i<n an 
n-tuple of operations of the following arities. Operation ppg, takes a corr R (in 
fact, a multimodel) with boundary OR = (A1...An), and an update uj: A; > A; 
as its input, and returns 
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(a) an (n — 1)-tuple of updates uj: Aj > A} with Sj FiS n; 
(b) a reflective update uj: Ai — AY alae called an amendment of Ui, 
(c) a new consistent corr R'e AF (AY. AM), 


In fact, operations ppg; complete a local update u; to an entire multimodel update 
with components (ui); 4: and ui; u; (see Fig. 3). 


Notation. If the first argument R of operation ppg; is fixed, the corresponding 
family of unary operations (whose only argument is u;) will be denoted by ppg”. 
By taking the jth component of the multi-element result, we obtain single-valued 
unary operations ppg producing, resp. updates uj = ppg (ui): A; > A}. Note 
that A; = A; for all j # i (see clause (a) of the definition) while ppg?’ is the 
reflective update (b). We also have operation ppg? returning a new consistent 
corr R” = ppg? (ui) according to (c). 

Definition 4 (Closed updates). Given a lens £ = (A, ppg) and a corr R € 
A*(Aj...An), we call an update uj: A; > A’, R-closed, if ppg? (ui) = ida,. An 
update is ¢-closed if it is R-closed for all R. Lens £ is called non-reflective at foot 
A,, if all updates in A? are €-closed. 


For the Running example, update propagation policies described in Sect. 2 
give rise to a lens non-reflective at space A3. 


Definition 5 (Well-behavedness). A lens £ = (A, ppg) is called well-behaved 
(wb) if the following laws hold for all i < n, A; € A}, R € A®(A;,_) and 
ui: Aj > Al, cf. Fig. 3. 

(Stability); Vj € {1...n}: 2 pee: (ida) =id4, and ppgit(ida,) = R 
(Reflect1), (ui, ui) E€ KY? 


(Reflect2), Vj #i: ppg” (ui u) = ppg (us) 
(Reflect3) , ppg? (us; u) = id av 
where u! = ppg(u;) as in Definition 3. 


Stability says that lenses do nothing voluntarily. Reflect1 says that amendment 
works towards “completion” rather than “undoing”, and Reflect2-3 are idempo- 
tency conditions to ensure the completion indeed done. 


Definition 6 (Invertibility). A wb lens is called (weakly) invertible, if it 
satisfies the following law for any i, update u;: A; > A, and RE AX (A;, ae 
(Invert), for all j A i: ppg (ppg}; (ppg (u:))) = ppg;s (ui) 


This law deals with “round-tripping”: operation ppg}? applied to update uj = 
ppgst (ua) results in update u; equivalent to u; in the sense that ppg/t (ú;) = 
PPE% (ui) (see [3] for a motivating discussion). 

Example 1 (Identity Lens ((nA)). Let A be an arbitrary model space. It gener- 
ates an n-ary lens (nA) as follows: The carrier A has n identical model spaces: 
A; = A for all ¿į € {1,..,n}, it has A* = A°, and boundary functions are 
identities. All updates are propagated to themselves (hence the name of ¢(nA)). 
Obviously, (nA) is a wb, invertible lens non-reflective at all its feet. 
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4 Compositionality of Update Propagation: Playing Lego 
with Lenses 


We study how lenses can be composed. Parallel constructions are easy to manage 
and excluded from the paper to save space (they can be found in the long ver- 
sion [1, Sect. 4.1]). More challenging are sequential constructs, in which different 
lenses share some of their feet, and updates propagated by one lens are taken 
and propagated further by one or several other lenses. In Sect. 4.1, we consider 
a rich example of such—star composition of lenses. In Sect. 4.2, we study how 
(symmetric) lenses can be assembled from asymmetric ones. 

Since we now work with several lenses, we need a notation for lens’ compo- 


nents. Given a lens 0 = (A, ppg), we write ¢* de" A* for its set of corrs. Feet 
are written @{ (i-th boundary space) and OfR for the i-th boundary of a corr 
R € €*. Propagation operations of the lens £ are denoted by é.ppgii, é.ppgi. 


4.1 Star Composition 


Running Example Continued. Dia- 


gram in Fig.4 presents a refinement of pun a fe 


our example, which explicitly includes 


å 1:6 
relational storage models By, for the u| > uv i i 
two data sources. We assume that object RB t a e o 3% ! 
: . . ; B! |a- le A! ES l =? yl! 
models Aj, are simple projective views 1 1 2 2 
of databases Bı 2: data in A; are copied u %8 a 
from B; without any transformation, ~1 | 
. eye s 9 M AA M R" Y IH Y 
while additional tables and attributes ues oe Ae Bae 2) Bw 


that B,;-data may have are excluded 
from the view A;; the traceability map- 
pings R;: A; > B; are thus embeddings. 
We further assume that synchronization of bases B; and their views A; is real- 
ized by simple constant-complement lenses 6;, i = 1,2 (see, e.g., [9]). Finally, 
let K, be a lens synchronizing models A1, A2,A3 as described in Sect.2, and 
R€k*(Aj, A2, A3) be a corr for some A3 not shown in the figure. 

Consider the following update propagation scenario. Suppose that at some 
moment we have consistency (Ri, R, R2) of all five models, and then Bı is 
updated with u1: Bı — Bi that, say, adds to Bı a record of Mary working for 
Google as discussed in Sect. 2. Consistency is restored with a four-step propaga- 
tion procedure shown by double-arrows labeled by x: y with x the step number 
and y the lens doing the propagation. Step 1: lens bı propagates update u1 to 
vi that adds (Mary, Google) to view A; with no amendment to u; as vj is just 
a projection of u1, thus, BY = Bi. Note also the updated traceability mapping 
Ri: Bio Aj. Step 2: lens K propagates vi to vý that adds (Google, Mary) 
to Ag, and amends v} with v? that adds (Mary, IBM) to Aj; a new consistent 
corr R” is also computed. Step 3: lens b2 propagates v4 to u% that adds Mary’s 
employment by Google to Bz with, perhaps, some other specific relational stor- 
age changes not visible in Az. We assume no amendment to v4’ as otherwise 


Fig. 4. Running example via lenses 
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access to relational storage would amend application data, and thus we have 
a consistent corr R% as shown. Step 4: lens 6, maps update vý (see above 
in Step 2) backward to uf’ that adds (Mary, IBM) to Bj so that By’ includes 
both (Mary, Google) and (Mary, IBM) and a respective consistent corr RY’ is 
provided. There is no amendment for v? by the same reason as in Step 3. 
Thus, all five models in the bottom line of Fig.4 (A¥ is not shown) are 
mutually consistent and all show that Mary is employed by IBM and Google. 
Synchronization is restored, and we can consider the entire scenario as propaga- 
tion of uy to ws’ and its amendment with u/’ so that finally we have a consis- 
tent corr (RY, R”, Ro’) interrelating BY’, AJ, BY’. Amendment u% is compatible 


with u1 as nothing is undone and condition (u1, uy’) € K¥ holds; the other two 
1 


equations required by Reflect2-3 for the pair (u1, uj’) also hold. For our simple 


projection views, these conditions will hold for other updates too, and we have 
a well-behaved propagation from Bı to Bə (and trivially to As). Similarly, we 
have a wb propagation from By to Bı and A3. Propagation from As to By, is 
non-reflective and done in two steps: first lens K, works, then lenses 6; work as 
described above (and updates produced by & are 6;-closed). Thus, we have built 
a wb ternary lens synchronizing spaces B1, B2 and Ag by joining lenses 6; and 
by to the central lens K. 


Discussion. Reflection is a crucial aspect of lens 
composition. The inset figure describes the scenario > a 
above as a transition system and shows that Steps e —— e —> eè ° 
3 and 4 can go concurrently. It is the non-trivial 1 2 PN 
amendment created in Step 2 that causes the neces- ° 
sity of Step 4, otherwise Step 3 would finish consis- 

tency restoration (with Step 4 being an idle transition). On the other hand, if 
update vý in Fig. 4 would not be closed for lens b2, we’d have yet another con- 
current step complicating the scenario. Fortunately for our example with simple 
projective views, Step 4 is simple and provides a non-conflicting amendment, but 
the case of more complex views beyond the constant-complement class needs care 
and investigation. Below we specify a simple situation of lens composition with 
reflection a priori excluded, and leave more complex cases for future work. 


Formal Definition. Suppose we have an n- 


ary lens Á = (A, ppg), and for every i < n, a By „B2 

binary lens 6; = (A;, Bj, 6;.ppg), with the first "SA, ee 

model space A; being the ith model space of K, Oe te 

(see Fig. 5, where K is depicted in the center and AX TEN 

b; are shown as ellipses adjoint to K’s feet). We nee , oo 
ites 3 


also assume the following Junction conditions: 
For any i < n, all updates propagated to A; by 
lens b; are k-closed, and all updates propagated Fig. 5. Star composition 
to A; by lens k are 6;-closed. 
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Below we will write a corr R; € 6x (Ai, Bi) as Ri: A; > Bi, and the sixtu- 
ple of operations 6;.ppg”* as the family ( i Je 
Likewise we write 0% with x € {A,B} for the boundary functions of lenses 6;. 

The above configuration gives rise to the following n-ary lens £. The carrier is 
the tuple of model spaces B,...B, and corrs are tuples (R, R1...Rn) with R € k* 
and Ri € 6*, such that akR = = an Ri for all i € 1..n. Moreover, we define 


Of Rı...Rn) = Lf ORR; (see Fig. 5). Operations are defined as compositions of 
consecutive lens’ executions as described below (we will use the dot notation for 
operation application and write x.op for op(x), where x is an argument). 

Given a model tuple (B,...B,) € Bı x...x Bn, a corr (R, R,...R,), and 
update v;: B; — B; in BY, we define, first for j Æ i, 


(R, Ri.. Rn) def 


bi Rj 
vi. LPPE V;-(6;-PPE pia )-(K-PPEt)-(5;-PPE a‘) 


and vi. ¢.ppg\!* egy def vi. 6:-PPERg for 7 = i. Note that all internal 
amendments to u; = V;.( 6;-PP& p's ) produced by k, and to uj = ui-(K-Ppg;s) 
produced by $j, are identities due to the Junction conditions. This allows 
us to set corrs properly and finish propagation with the three steps above: 
vi. L. ppg MisRa) def (R', R} ...R}) where R' = u;. k-ppgE, R; = u}. 6; PPE x, 
for j At, and Ri = v;i. b;.ppgž; . We thus have a lens £ denoted by k*(6,,..., bn). 


Theorem 1 (Star Composition). Given a star configuration of lenses as 
above, if lens k fulfills Stability, all lenses 6; are wb, and Junction conditions 
hold, then the composed lens K*(61,...,6n) defined above is wb, too. 


Proof. Laws Stability and Reflect1 for the composed lens are straightforward. 
Reflect2-3 also follow immediately, since the first step of the above propagation 
procedure already enjoys idempotency by Reflect2-3 for 6;. 


4.2 Assembling n-ary Lenses from Binary Lenses 


This section shows how to assemble n-ary (symmetric) lenses from binary asym- 
metric lenses modelling view computation [2]. As the latter is a typical bx, 
the well-behavedness of asymmetric lenses has important distinctions from well- 
behavedness of general (symmetric mx-tailored) lenses. 


Definition 7 (Asymmetric Lens, cf. [2]). An asymmetric lens (a-lens) is a 
tuple 6% = (A,B, get, put) with A a model space called the (abstract) view, B 
a model space called the base, get: A — B a functor (read “get the view”), and 
put a family of operations (put? |Be B°) (read “put the view update back”) of 
the following arity. Provided with a view update v: get(B) — A’ at the input, 
operation put? outputs a base update putë (v) = u': B > B” and a reflected 
view update put? (v) = v': A! — A” such that A” = get(B”’). A view update 
v: get(B) — A’ is called closed if put? (v) = idx. 
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The following is a specialization of Definition 5. 
Definition 8 (Well-behavedness). An a-lens is well-behaved (wb) if it sat- 


isfies the following laws for all B € B® and v: get( B) — A’ 
(Stability) put? (idget(B)) = idg 


(Reflect0) put?(v) Æ ida; implies A’ #4 get(X) for all X € B® 
(Reflect1) (v, v’) EK 

(Reflect2) put? (v; put? (v)) = putë (v) 

(PutGet) v; put? (v) = get(putë (v)) 


In contrast to the general lens case, a wb a-lens features Reflect0—a sort of 
self-Hippocraticness important for bx. Another distinction is inclusion of a 
strong invertibility law PutGet into the definition of well-behavedness: Put- 
Get together with Reflect2 provide (weak) invertibility: put? (get(putë (v))) = 
put? (v). Reflect3 is omitted as it is implied by ReflectO and PutGet. 

Any a-lens 6% = (A,B, get, put) gives rise to a binary symmetric lens 6. Its 
carrier consists of model spaces A and B. Furthermore 6* = B° with boundary 
mappings defined as follows: for R € 6* = B°, OA R = get(R) and ORR = R. 
Thus, the set of corrs 6*(A, B) is {B} if A = get(B), and is empty otherwise. 

For a corr B, we need to define six operations 6.ppg? . Ifv: A — A’ is a view 
update, then ppg (v) = putP(v) : B > B", ppg (v) = put? (v) : A’ > A”, 
and ppg (v) = B”. The condition A” = get( B”) for b5 means that B” is again 
a consistent corr with the desired boundaries. For a base update u: B — B’ and 
corr B, ppg8, (u) = get(u), ppgBa(u) = idg’, and ppgg,(u) = B’. Functoriality 
of get yields consistency of B’. 


Lemma 1. Let b65 be a wb a-lens and b the corresponding symmetric lens. Then 
all base updates of 6 are closed, and 6 is wb and invertible. 


Proof. Base updates are closed by the definition of ppggp. Well-behavedness 
follows from wb-ness of 6%. Invertibility has to be proved in two directions: 
PPZBA; PPZAB; PPSBA = PPSpa follows from (PutGet) and (Reflect0) the other 
direction follows from (PutGet) and (Reflect2) see the remark after Definition 8. 


Theorem 2 (Lenses from Spans). An n-ary span of wb a-lenses 65 = 
(A;,B, get,;, put;), i = 1..n with common base B of all 6S gives rise to a wb 
(symmetric) lens denoted by X? 67. 


Proof. An n-ary span of a-lenses 65 (all of them interpreted as symmetric lenses 
6; as explained above) is a construct equivalent to the star-composition of Def- 
inition 4.1.3, in which lens K = (nB) (cf. Example 1) and peripheral lenses are 
lenses 6;. The junction condition is satisfied as all base updates are 6;-closed for 
all 7 by Lemma 1, and also trivially closed for any identity lens. The theorem 
thus follows from Theorem 1. Note that a corr in (2%, 6%)* is nothing but a 
single model B € B® with boundaries being the respective get,-images. 
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The theorem shows that combining a-lenses in this way yields an n-ary sym- 
metric lens, whose properties can automatically be inferred from the binary 
a-lenses. 


Running example. Figure6 shows a metamodel M+ obtained by merging the 
three metamodels M1 9.3 from Fig. 1 without loss and duplication of information. 
In addition, for persons and companies, the identifiers of model spaces, in which 
a given person or company occurs, can be traced back via attribute “spaces” 
(Commute-objects are known to appear in space A3 and hence do not need such 
an attribute). As shown in [10], any consistent multimodel (A1...An, R) can be 
merged into a comprehensive model At instantiating M*. Let B be the space 
of such together with their comprehensive updates ut: At > A'T. 

For a given 7 < 3, we can define the fol- 


lowing a-lens 6s = (A;, B, get,, put;): get; takes =m empl-er Company 
update u* as above and outputs its restriction ale e emee [Tames Addr 
to the model containing only objects recorded _ | Lseaces intl spaces: int] 
in space A;. Operation put; takes an update M* 
vi: A; — A‘ and first propagates it to all direc- \ Ss I 


tions as discussed in Sect. 2, then merges these 

propagated local updates into a comprehensive Fig. 6. Merged metamodel 
B-update between comprehensive models. This yields a span of a-lenses that 
implements the same synchronization behaviour as the symmetric lens discussed 
in Sect. 2. 


From lenses to spans. There is also a backward transformation of (symmetric) 
lenses to spans of a-lenses. Let l = (A, ppg) be a wb lens. It gives rise to the 
following span of wb a-lenses ¿Ñ = (3; (A), B, get;, put;) where space B is built 
from consistent multimodels and their updates, and functors get; : B — A; are 
projection functors. Given B = (A1...An, R) and update u;: A; > A’, let 


put? (u;) $ (uh, uk 1, (ui Uh), ulga, Uh): (A1:-An, R) > (AY...A!, R) 


where uj = ppg (u;) (all j) and R” = ppgĒ(u;). Finally, put? (v;) = 
ppg (u;:). Validity of Stability, Reflect0-2, PutGet directly follows from the above 
definitions. 

An open question is whether the span-to-lens transformation in Theorem 2 
and the lens-to-span transformation described above are mutually inverse. The 
results for the binary case in [8] show that this is only the case modulo cer- 
tain equivalence relations. These equivalences may be different for our reflective 


multiary lenses, and we leave this important question for future research. 


5 Related Work 


For state-based lenses, the work closest in spirit is Stevens’ paper [16]. Her and 
our goals are similar, but the technical realisations are different even besides 
the state- vs. delta-based opposition. Stevens works with restorers, which take 
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a multimodel (in the state-based setting, just a tuple of models) presumably 
inconsistent, and restores consistency by changing some models in the tuple while 
keeping other models (from the authority set) unchanged. In contrast, lenses take 
a consistent multimodel and updates, and return a consistent multimodel and 
updates. Also, update amendments are not considered in [16] — models in the 
authority set are intact. 

Another distinction is how the multiary vs. binary issue is treated. Stevens 
provides several results for decomposing an n-ary relation A* into binary rela- 
tions Ax C A; x A; between the components. For us, a relation is a span, i.e., a 
set A*® endowed with an n-tuple of projections 0;: A* — A; uniquely identify- 
ing elements in A*. Thus, while Stevens considers “binarisation” of a relation R 
over its boundary A;...A,, we “binarise” it via the corresponding span (the UML 
would call it reification). Our (de)composition results demonstrate advantages 
of the span view. Discussion of several other works in the state-based world, 
notably by Macedo et al. [12] can be found in [16]. 

Compositionality as a fundamental principle for building synchronization 
tools was proposed by Pierce and his coauthors, and realized for several types of 
binary lenses in [4,6,7]. In the delta-lens world, a fundamental theory of equiva- 
lence of symmetric lenses and spans of a-lenses (for the binary case) is developed 
by Johnson and Rosebrugh [8], but they do not consider reflective updates. The 
PutGetPut law has been discussed (in a different context of state-based asym- 
metric injective editing) in several early bx work from Tokyo, e.g., [13]. A notion 
close to our update compatibility was proposed by Orejas et alin [14]. We are not 
aware of multiary update propagation work in the delta-lens world. Considering 
amendment and its laws in the delta lens setting is also new. 

In [11], Königs and Schiirr introduced multigraph grammars (MGGs) as 
a multiary version of well-known triple graph grammar (TGG). Their multi- 
domain-integration rules specify how all involved graphs evolve simultaneously. 
The idea of an additional correspondence graph is close to our consistent corrs. 
However, their scenarios are specialized towards (1) directed graphs, (2) MOF- 
compliant artifacts like QVT, and (3) the global consistency view on a multi- 
model rather than update propagation. 


6 Conclusions and Future Work 


We have considered multiple model synchronization via multi-directional update 
propagation, and argued that reflective propagation to the model whose change 
originated inconsistency is a reasonable feature of the scenario. We presented a 
mathematical framework for such synchronization based on a multiary general- 
isation of binary symmetric delta lenses introduced earlier in [3], and enriched 
it with reflective propagation. Our lens composition results make the framework 
interesting for practical applications, but so far it has an essential limitation: 
we consider consistency violation caused by only one model change, and thus 
consistency is restored by propagating only one update, while in practice we 
often deal with several models changing concurrently. If these updates are in 
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conflict, consistency restoration needs conflict resolution, and hence an essential 
development of the framework. 

There are also several open issues for the non-concurrent case considered in 
the paper (and its future concurrent generalisation). First, our pool of lens com- 
position constructs is far incomplete (because of both space limitations and the 
necessity of further research). We need to enrich it with (i) sequential composi- 
tion of (reflective) a-lenses so that a category of a-lenses could be built, and (ii) 
a relational composition of symmetric lenses sharing several of their feet (similar 
to relational join). It is also important to investigate composition with weaker 
junction conditions than we considered. Another important issue is invertibility, 
which nicely fits in some but not all of our results, which shows the necessity of 
further investigation. It is a sign that we do not well understand the nature of 
invertibility. We conjecture that while invertibility is essential for bx, its role for 
mx may be less important. The (in)famous PutPut law is also awaiting its explo- 
ration in the case of multiary reflective propagation. And the last but not the 
least is the (in)famous PutPut law: how well our update propagation operations 
are compatible with update composition is a very important issue to explore. 
Finally, paper [5] shows how binary delta lenses can be implemented with TGG, 
and we expect that MGG could play a similar role for multiary delta lenses. 
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Abstract. Refactorings constitute an effective means to improve quality 
and maintainability of evolving object-oriented programs. Search-based 
techniques have shown promising results in finding optimal sequences of 
behavior-preserving program transformations that (1) maximize code- 
quality metrics and (2) minimize the number of changes. However, the 
impact of refactorings on extra-functional properties like security has 
received little attention so far. To this end, we propose as a further objec- 
tive to minimize the attack surface of programs (i.e., to maximize strict- 
ness of declared accessibility of class members). Minimizing the attack 
surface naturally competes with applicability of established MoveMethod 
refactorings for improving coupling/cohesion metrics. Our tool imple- 
mentation is based on an EMF meta-model for Java-like programs and 
utilizes MOMoT, a search-based model-transformation framework. Our 
experimental results gained from a collection of real-world Java programs 
show the impact of attack surface minimization on design-improving 
refactorings by using different accessibility-control strategies. We further 
compare the results to those of existing refactoring tools. 


1 Introduction 


The essential activity in designing object-oriented programs is to identify class 
candidates and to assign responsibility (i.e., data and operations) to them. An 
appropriate solution to this Class-Responsibility-Assignment (CRA) problem, on 
the one hand, intuitively reflects the problem domain and, on the other hand, 
exhibits acceptable quality measures [4]. In this context, refactoring has become 
a key technique for agile software development: productive program-evolution 
phases are interleaved with behavior-preserving code transformations for updat- 
ing CRA decisions, to proactively maintain, or even improve, code-quality met- 
rics [13,29]. Each refactoring pursues a trade-off between two major, and gen- 
erally contradicting, objectives: (1) maximizing code-quality metrics, including 
fine-grained coupling/cohesion measures as well as coarse-grained anti-pattern 
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avoidance, and (2) minimizing the number of changes to preserve the initial pro- 
gram design as much as possible [8]. Manual search for refactorings sufficiently 
meeting both objectives becomes impracticable already for medium-size pro- 
grams, as it requires to find optimal sequences of interdependent code transfor- 
mations with complex constraints [10]. The very large search space and multiple 
competing objectives make the underlying optimization problem well-suited for 
search-based optimization [15] for which various semi-automated approaches for 
recommending refactorings have been recently proposed [18, 27,28, 30, 34]. 

The validity of proposed refactorings is mostly concerned with purely func- 
tional behavior preservation [24], whereas their impact on eztra-functional prop- 
erties like program security has received little attention so far [22]. However, 
applying elaborated information-flow metrics for identifying security-preserving 
refactorings is computationally too expensive in practice [36]. As an alterna- 
tive, we consider attack-surface metrics as a sufficiently reliable, yet easy-to- 
compute indicator for preservation of program security [20,41]. Attack surfaces 
of programs comprise all conventional ways of entering a software by users/at- 
tackers (e.g., invoking API methods or inheriting from super-classes) such that 
an unnecessarily large surface increases the danger of exploiting vulnerabilities. 
Hence, the goal of a secure program design should be to grant least privileges to 
class members to reduce the extent to which data and operations are exposed 
to the world [41]. In JAva-like languages, accessibility constraints by means of 
modifiers public, private and protected provide a built-in low-level mecha- 
nism for controlling and restricting information flow within and across classes, 
sub-classes and packages [38]. Accessibility constraints introduce compile-time 
security barriers protecting trusted system code from untrusted mobile code [19]. 
As a downside, restricted accessibility privileges naturally obstruct possibilities 
for refactorings, as CRA updates (e.g., moving members [34]) may be either 
rejected by those constraints, or they require to relax accessibility privileges, 
thus increasing the attack surface [35]. 

In this paper, we present a search-based technique to find optimal sequences 
of refactorings for object-oriented JAvA-like programs, by explicitly taking acces- 
sibility constraints into account. To this end, we do not propose novel refac- 
toring operations, but rather apply established ones and control their impact 
on attack-surface metrics. We focus on MoveMethod refactorings which have 
been proven effective for improving CRA metrics [34], in combination with 
operations for on-demand strengthening and relaxing of accessibility declara- 
tions [38]. As objectives, we consider (O1) elimination of design flaws, partic- 
ularly, (Ola) optimization of object-oriented coupling/cohesion metrics [5,6] 
and (O1b) avoidance of anti-patterns, namely The Blob, (O2) preservation 
of original program design (i.e., minimizing the number of change operations), 
and (O8) attack-surface minimization. Our model-based tool implementation, 
called GOBLIN, represents individuals (i.e., intermediate refactoring results) as 
program-model instances complying to an EMF meta-model for JAvA-like pro- 
grams [33]. Hence, instead of regenerating source code after every single refactor- 
ing step, we apply and evaluate sequences of refactoring operations, specified as 
model-transformation rules in HENSHIN [2], on the program model. To this end, 
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+ getSubject() : String 


Fig. 1. UML class diagram of MAILAPP 


we apply MOMoT [11], a generic framework for search-based model transfor- 
mations. Our experimental evaluation results gained from applying GOBLIN 
as well as the recent tools JOEODORANT [12] and CopDE-Imp [27] to a collection 
of real-world JAVA programs provide us with in-depth insights into the subtle 
interplay between traditional code-quality metrics and attack-surface metrics. 
Our tool and all experiment results are available on the GitHub site of the 
project!. 


2 Background and Motivation 


We first introduce a running example to provide the necessary background and 
to motivate the proposed refactoring methodology. 


Running Example. We consider a (simplified) e-mail client, called MAILAPP, 
implemented in JAVA. Figure 1 shows the UML class diagram of MAILAPP, where 
security-critical extensions (in gray) will be described below. We use stereo- 
type ((pkg : name)) to annotate classes with package declarations. Central class 
MailApp is responsible for handling objects of classes Message and Contact both 
encapsulating application data and operations to access those attributes. The 
text of a message may be formatted as plain String, or it may be converted into 
HTML using method plainToHtml(). 


Design Flaws in Object-Oriented Programs. The over-centralized architec- 
tural design of MAILAPP, consisting of a predominant controller class (MailApp) 
intensively accessing inactive data classes (Message and Contact), is frequently 
referred to as The Blob anti-pattern [7]. As a consequence, method plainToHtml() 
in class MailApp frequently calls method getPlainText() in class Message across 
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class- and even package-boundaries. The Blob and other design flaws are widely 
considered harmful with respect to software quality in general and program main- 
tainability in particular [7]. For instance, assume a developer to extend MailApp 
by (1) adding further classes SecureMailApp and RsaAdapter for encrypting and 
signing messages, and by (2) extending class Contact with public RSA key han- 
dling: method findKey() searches for public RSA keys of contacts by repeatedly 
calling method findKeyFromServer() with the URL of available key servers. This 
program evolution further decays the already flawed design of MAILAPP as class 
SecureMailApp may be considered as a second instance of The Blob anti-pattern: 
method encryptMessage() of class SecureMailApp intensively calls method find- 
Key() in class Contact. This example illustrates a well-known dilemma of agile 
program development in an object-oriented world: Class-Responsibility Assign- 
ment decisions may become unbalanced over time, due to unforeseen changes 
crosscutting the initial program design [31]. As a result, a majority of object- 
oriented design flaws like The Blob anti-pattern is mainly caused by low cohe- 
sion/high coupling ratios within/among classes and their members [5,6]. 


Refactoring of Object-Oriented Programs. Object-oriented refactorings 
constitute an emerging and widely used counter-measure against design 
flaws [13]. Refactorings impose systematic, semantic-preserving program trans- 
formations for continuously improving code-quality measures of evolving source 
code. For instance, the MoveMethod refactoring is frequently used to update 
CRA decisions after program changes, by moving method implementations 
between classes [34]. Applied to our example, a developer may (manually) con- 
duct two refactorings, R1 and R2, to counteract the aforementioned design 
flaws: 


(R1) move method plainToHtml() from class MailApp to class Message, and 
(R2) move method encryptMessage() from class SecureMailApp to class Contact. 


However, concerning programs of realistic size and complexity, tool support 
for (semi-)automated program refactorings becomes more and more inevitable. 
The major challenges in finding effective sequences of object-oriented refactoring 
operations consists in detecting flawed program parts to be refactored, as well as 
in recommending program transformations applied to those parts to obtain an 
improved, yet behaviorally equivalent program design. The complicated nature 
of the underlying optimization problem stems from several phenomena. 


— Very large search-space due to the combinatorial explosion resulting 
from the many possible sequences of (potentially interdependent) refactoring- 
operation applications. 

— Multiple objectives including various (inherently contradicting) refactoring 
goals (e.g., O1—O3). 

— Many invalid solutions due to (generally very complicated) constraints to 
be imposed for ensuring behavior preservation. 


Further research especially on the last phenomenon is required to understand 
to what extent a refactoring actually alters (in a potentially critical way) the 
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original program. For instance, for refactoring R2 to yield a correct result, it 
requires to relax declared accessibility constraints: method encryptMessage() has 
to become public instead of protected after being moved into class Contact 
to remain accessible for method sendMessage, and, conversely, method getPri- 
vateKey() has to become public instead of private to remain accessible for 
encryptMessage(). Although these small changes do not affect the functionality 
of the original program, it may have a negative impact on extra-functional prop- 
erties like program security. Therefore, the amount of invalid solutions highly 
depends on the interaction between constraints and repair mechanisms. 


Attack Surface of Object-Oriented Programs. The attack surface of a pro- 
gram comprises all conventional ways of entering a software from outside such 
that a larger surface increases the danger of exploiting vulnerabilities (either 
unintentionally by some user, or intentionally by an attacker) [20]. Concern- 
ing JAvA-like programs in particular, explicit restrictions of accessibility of class 
members provide an essential mechanism to control the attack surface. Hence, 
refactoring R2 should be definitely blamed as harmful as the enforced relax- 
ations of accessibility constraints, especially those of the indeed security-critical 
method getPrivateKey(), unnecessarily widen the attack surface of the original 
program. In contrast, refactoring R1 should be appreciated as it even narrows 
the attack surface by setting method plainToHtml() from public to private. 


Challenges. As illustrated by our example, the attack surface of a program is a 
crucial, but yet unexplored, factor when searching for reasonable object-oriented 
program refactorings. However, if not treated with special care, accessibility con- 
straints may seriously obstruct program maintenance by eagerly suppressing any 
refactoring opportunity in advance. We therefore pursue a model-based method- 
ology for automating the search for optimal sequences of program refactorings by 
explicitly taking accessibility constraints into account. We formulate the under- 
lying problem as constrained multi-objective optimization problem (MOOP) 
incorporating explicit control and minimization of attack-surface metrics. This 
framework allows us to facilitate search-based model transformation capabilities 
for approximating optimal solutions. 


3 Search-Based Program Refactorings 
with Attack-Surface Control 


We now describe our model-based framework for identifying (presumably) opti- 
mal sequences of object-oriented refactoring operations. To explicitly control 
(and minimize) the impact of recommended refactorings on the attack surface, 
we extend an existing EMF meta-model for representing JAVA-like programs 
with accessibility information and respective constraints. Based on this model, 
refactoring operations are defined as model-transformation rules which allow 
us to apply search-based model-transformation techniques to effectively explore 
candidate solutions of the resulting MOOP. 
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3.1 Program Model 


In the context of model-based program transformation, a program model serves 
as unified program representation (1) constituting an appropriate level of 
abstraction comprising only (syntactic) program entities being relevant for a 
given task, and (2) including additional (static semantic) information required 
for a given task [24]. Concerning program models for model-based object-oriented 
program refactorings in particular, the corresponding model-transformation 
operations are mostly applied at the level of classes and members, whereas more 
fine-grained source code details can be neglected. Instead, program elements 
are augmented with additional (static semantic) dependencies to other entities 
being crucial for refactoring operations to yield correct results [24-26]. Here, we 
employ and enhance the program model proposed by Peldszus et al. [33] for auto- 
matically detecting structural anti-patterns (cf. O1b) in JAVA programs. Their 
incremental detection process also includes evaluation of coupling and cohesion 
metrics (cf. Ola), and both metric values and the detected anti-patterns are 


added as additional information into the program model. 
Control: Secure: 
TPackage TPackage 
Controller F a Controller 
MailApp: SecureMailApp: 
Class Tclass It <<extends>> TClass Class 
———— | 
The Blob + The Blob 
plainToHtmlSig: sendMessageSig: encryptMessageSig: 
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ry 
TEE sas 
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Fig. 2. Excerpt of the program-model representation of MailApp 


Figure 2 shows an excerpt of the program-model representation for MailApp 
including the classes MailApp, Message, SecureMailApp, and Contact together 
with a selection of their method definitions. Each program element is repre- 
sented by a white rectangle labeled with name : type. The available types 
of program entities and possible (syntactic and semantic) dependencies (rep- 
resented by arrows) between respective program elements are defined by a 
program meta-model, serving as a template for valid program models [26,37]. 
The program model comprises as first-class entities the classes (type TClass) 
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Rule moveMethod(srcClass: TClass, trgClass: TClass, methodSig: TMethodSignature) 


sourceClass: targetClass: 


TClass TClass { post: 
forAll ( m: Members ) : 
‘| ++ m.accessibility >= 
reqAcc(m) 
methodSig: x } 


TMethodSignature 


Fig. 3. Model-transformation rule for MoveMethod refactoring 


together with their members as declared in the program. The representation of 
methods is split into signatures (type TMethodSignature) and definitions (type 
TMethodDefinition) to capture overloading/overriding dependencies among 
method declarations (e.g., overriding of method sendMessage() imposes one 
shared method signature, but two different method definitions). Solid arrows 
correspond to syntactic dependencies between program elements such as aggrega- 
tion (unlabeled) and inheritance (label extends) and relations between method 
signatures and their definitions, whereas dashed arrows represent (static) seman- 
tic dependencies (e.g., arrows labeled with call denote caller-callee relations 
between methods). 


Design-Flaw Information. The program model further incorporates informa- 
tion gained from design-flaw detection [33], to identify program parts to be refac- 
tored. In our example, design-flaw annotations (in gray) are attached to affected 
program elements, namely classes Message and Contact constitute data classes 
and classes MailApp and SecureMailApp constitute controller classes, which lead 
to two instances of the anti-pattern The Blob. 


Accessibility Information. To reason about the impact of refactorings on the 
attack surface of programs, we extend the program model of Peldszus et al. by 
accessibility information. Our extensions include the attribute accessibility 
denoting the declared accessibility of entities as shown for method definitions in 
Fig. 2. In addition, our model comprises package declarations of classes (type 
TPackage) to reason about package-dependent accessibility constraints. 


3.2 Model-Based Program Refactorings 


Based on the program-model representation, refactoring operations by means 
of semantic-preserving program transformations can be concisely formalized in 
a declarative manner in terms of model-transformation rules [26]. A model- 
transformation rule specifies a generic change pattern consisting of a left-hand side 
pattern to be matched in an input model for applying the rule, and a right-hand 
side replacing the occurrence of the left-hand side to yield an output model. Here, 
we focus on (sequences of) Move Method refactorings as it has been shown in recent 
research that MoveMethod refactorings are considerably effective in improving 
CRA measures in flawed object-oriented program designs [34]. Figure 3 shows 
a (simplified) rule for MoveMethod refactorings defined on our program meta- 
model, using a compact visual notation superimposing the left- and right-hand 
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side. The rule takes a source class srcClass, a target class trgClass and a method 
signature methodSig as parameters, deletes the containment arrow between source 
class and signature (red arrow annotated with --) and creates a new contain- 
ment arrow from the target class (green arrow annotated with ++), only if such 
an arrow not already exists before rule application. The latter (pre-)condition is 
expressed by a forbidden (crossed-out) arrow. For a comprehensive list of all nec- 
essary pre-conditions (or, pre-constraints), we refer to [38]. 


Accessibility Post-constraints. Besides pre-constraints, for refactoring oper- 
ations to yield correct results, it must satisfy further post-constraints to be 
evaluated after rule application, especially concerning accessibility constraints 
as declared in the original program (i.e., member accesses like method calls in 
the original program must be preserved after refactoring [24]). As an example, 
a (simplified) post-constraint for the MoveMethod rule is shown on the right 
of Fig. 3 using OCL-like notation. Members refers to the collection of all class 
members in the program. The post-constraint utilizes helper-function reqAcc(m) 
to compute the required access modifier of class member m and checks whether 
the declared accessibility of m is at least as generous as required (based on the 
canonical ordering private < default < protected < public) [38]. 

For instance, if refactoring R2 is applied to MAILAPP, method encryptMes- 
sage() violates this post-constraint, as the call from sendMessage() from another 
package requires accessibility public, whereas the declared accessibility is 
protected. Instead of immediately rejecting refactorings like R2, we introduce 
an accessibility-repair operation of the form m.accessibility := reqAcc(m) for each 
member violating the post-constraint which therefore causes a relaxation of the 
attack surface. However, this repair is not always possible as relaxations may 
lead to incorrect refactorings altering the original program semantics (e.g., due 
to method overriding/overloading [38]). In contrast, refactoring R1 (i.e., mov- 
ing plainToHtml() to class Message) satisfies the post-constraint as the required 
accessibility of plainToHtml() becomes private, whereas its declared accessibil- 
ity is public. In those cases, we may also apply the operation m.accessibility := 
reqAcc(m), now leading to a reduction of the attack surface. Different strategies 
for attack-surface reduction will be investigated in Sect. 4. 


3.3 Optimization Objectives 


We now describe the evaluation of objectives (O1)—(O3) on the program model, 
to serve as fitness values in a search-based setting. 


Coupling/Cohesion. Concerning (Ola), coupling and cohesion metrics are 
well-established quality measures for CRA decisions in object-oriented program 
design [4]. In our program model, coupling (COU) is related to the overall 
number of member accesses (e.g., call-arrows) across class boundaries [5], and for 
measuring cohesion, we adopt the well-known LCOMS5 metric to quantify lack of 
cohesion among members within classes [17]. While there are other metrics which 
indicate good CRA decisions, such as Number of Children, these metrics are 
not modifiable using MoveMethod refactorings and are therefore not used in 
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this paper [9]. Consequently, good CRA decisions exhibit low values for both 
COU and LCOMS. Hence, refactorings R1 and R2 both improve values of 
COU (i.e., by eliminating inter-class call-arrows) and LCOMS (i.e., by moving 
methods into classes where they are called). 


Anti-patterns. Concerning (O1b), we limit our considerations to occurrences 
of The Blob anti-pattern for convenience. We employ the detection-approach of 
Peldszus et al. [33] and consider as objective to minimize the number of The Blob 
instances (denoted ##BLOB). For instance, for the original MailApp program 
(white parts in Fig. 1), we have ##BLOB = 1, while for the extended version 
(white and gray parts), we have #BLOB = 2. Refactoring R1 may help to 
remove the first occurrence and R2 potentially removes the second one. 


Changes. Concerning (O2), real-life studies show that refactoring recommen- 
dations to be accepted by users must avoid a too large deviation from the original 
design [8]. Here, we consider the number of MoveMethod refactorings (denoted 
##REF) to be performed in a recommendation, as a further objective to be 
minimized. For example, solely applying R1 results in ##REF = 1, whereas a 
sequence of R1 followed by R2 most likely imposes more design changes (i.e., 
#REF = 2). In contrast, accessibility-repair operations do not affect the value 
#:REF, but rather impact objective (03). 


Attack Surface. Concerning (O3), the guidelines for secure object-oriented 
programming encourages developers to grant as least access privileges as possible 
to any accessible program element to minimize the attack surface [19]. In our 
program model, the attack-surface metric (denoted AS) is measured as 


AS= J emersa (m accessibility), (1) 


where weighting function w : Mod — No on the set Mod of accessibility modifiers 
may be, for instance, defined as w(private) = 0, w(default) = 1, w(protected) 
= 2, w(public) = 3. Hence, a lower value corresponds to a smaller attack surface. 
For example, R1 enables an attack-surface reduction by setting plainToHtml() from 
public to private which decreases AS by 3. In contrast, R2 involves a repair 
step setting encryptMessage() from protected to public which increases AS by 1. 
Whether such negative impacts of refactorings on (O3) are outweighed by simul- 
taneous improvements gained for other objectives depends, among others, on the 
actual weighting w applied. For instance, each further modifier public consider- 
ably opens the attack surface and should therefore be blamed by a higher weighting 
value, as compared to the other modifiers (cf. Sect. 4). 


3.4  Search-Based Optimization Process 


Our tool for recommending optimized object-oriented refactoring sequences, 
called GOBLIN?, is based on a combination of search-based multi-objective 


? Goblin is supervillain and Head of National Security in the Marvel universe [3]. GOB- 
LIN also means Generic Objective-Based Layout Improvements for Non-designs. 
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optimization techniques using genetic algorithms and model-transformations on 
the basis of the MOMOT framework [11]. Figure 4 shows an overview on GOB- 
LIN. First, the input JAVA program is translated into our program model [33]. 
This original program model together with its objective values for (O1)—(O3) 
(i.e., its fitness values) serves as a baseline for evaluating the improvements 
obtained by candidate refactorings. The built-in genetic algorithm (NSGA-IITI) 
of MOMOT is initialized by an initial population of a fixed number of indi- 
viduals serving as generation 0, where each individual constitutes a sequence 
of at least 1 up to a maximum number of MoveMethod rule applications (cf. 
Fig. 3) to the original program model. Thus, each individual corresponds to a 
refactored version of the original program model on which the resulting fitness 
values are evaluated. The refactored program model is obtained by applying the 
given sequence of refactorings to the original program model. Steps within a 
sequence not being applicable to an intermediate model (e.g., due to unsatisfied 
pre-conditions) are skipped, whereas steps producing infeasible results (e.g., due 
to unsatisfied and non-repairable post-conditions) cause the entire individual to 
become invalid (thus being removed from the population). 


1 Selection of 
Generation i+ 1 


(01): 5 

i (02): 5 

a R1 | R2 | RS | ~ P eng 15 

i (01): 10 

1 Fitness Values, sm R4 | R2 | R3| ~ 1 Ee (02): 5 

=> 0 (01): 10 f (03): 10 
= => (02): 0 


Original 


i : j (01): 4 
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Fig. 4. Architecture of the GOBLIN tool 


For deriving generation i + 1 from generation i, NSGA-III first creates a set 
of new individuals using random crossover and mutation operators. As indi- 
cated in Fig.4, a crossover splits and recombines two individuals into a new 
one, while a mutation generates a new individual by injecting small changes into 
an existing one. Afterwards, in the selection phase, individuals from the over- 
all population (the original and newly created individuals) are selected into the 
next generation, depending on their fitness values. For more details on NSGA- 
III, we refer to [15,28]. The search-process terminates when a maximum number 
of generations (or, individuals, respectively) has been reached, resulting in a 
Pareto-front of non-dominated individuals, each constituting a refactoring rec- 
ommendation [11]. 
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4 Experimental Evaluation 


We now present experimental evaluation results gained from applying GOB- 
LIN to a collection of JAVA programs. First, to investigate the impact of attack- 
surface reduction on the resulting refactoring recommendations, we consider the 
following reduction strategies, differing in when to perform attack-surface reduc- 
tion during search-space exploration (where step means a refactoring step): 


— Strategy 1: A priori reduction. Before the first and after the last step. 
— Strategy 2: A posteriori reduction. Only after the last step. 
— Strategy 3: Continuous reduction. After every refactoring step. 


We are interested in the impact of each strategy on the trade-off between attack- 
surface metrics and design-quality metrics (i.e., do the recommended refactor- 
ing sequences tend to optimize more the attack surface aspect or the program 
design’). We quantify attack-surface impact (ASTI) and design impact (DI) of a 
refactoring recommendation rr as follows: 


AS(rr) — AS(orig) 


BEI AS (orig) 


(2) 


_ COU(rr) — COU(orig) | LCOMS(rr) — LCOMS (orig) 
Dir) = ——~ cquiorg) LCOMS (orig) (3) 


where orig refers to the original program. Second, we consider the impact of 
different weightings w on attack-surface metric AS. As modifier public has a 
considerably negative influence on the attack surface, we study the impact of 
increasing the penalty for public in w, as compared to the other modifiers. We 
are interested especially in whether there exists a threshold for which any design- 
improving refactoring would be rejected as security-critical. Finally, we compare 
GOBLIN to the recent refactoring tools JDEODORANT and CODE-IMP, which 
both do not explicitly consider attack-surface metrics as optimization objective 
so far. To summarize, we aim to answer the following research questions: 


— (RQ1: Objective Trade-Off) Which attack-surface reduction strategy 
offers the best trade-off between attack-surface impact and design impact 
when taking the original program as a baseline? 

— (RQ2: Weighting of Attack Surface) Which weighting of public in the 
attack-surface metric constitutes a critical threshold obstructing any design- 
improving refactorings? 

— (RQ3: Tool Comparison) Which tool provides the best trade-off between 
attack-surface impact and design impact in refactoring recommendations? 


4.1 Experiment Setup and Results 


We conducted our experiments on an established corpus of real-life open-source 
JAVA programs of various size [33,39] as listed in Table 1 (with lines of code 


Attack Surface of OO Refactorings 49 


LOC, number of packages #P, number of classes ##C and number of methods 
#M). For a compact presentation, we divide the corpus into three program- 
size categories (small, mid-sized, large), indicated by horizontal lines in Table 1. 
All experiments have been executed on a Windows-Server-2016 machine with a 
2.4 GHz quad-core CPU, 32 GB RAM and JRE 1.8. We used the default genetic- 
algorithm configuration of MOMoT in all our experiments [11]: termination after 
10,000 individual evaluations, population size of 100, and each individual con- 
sisting of at most 10 refactorings. We applied the metrics for (O1)—(O8) (cf. 
Sect. 3.3) to compute fitness values. GOBLIN requires 25 min to compute a set 
of refactoring recommendations for the smallest program, up to several hours 
in the case of large programs, which is acceptable for a search-based (off-line) 
optimization approach. We selected a representative set of computed recommen- 
dations which were manually checked for program correctness and impact. 

For (RQ1), we measured ASI and DI values for two runs of GOBLIN (cf. 
Figs. 6a, b, c, d, e and f). Figures 6a and b (first row, side by side) show a box- 
plot for each Strategy (1—3) for small programs of our corpus (#iSj referring 
to the program number i in Table1 and Strategy j). The box-plots show the 
distribution of ASI (Fig. 6a) and DI (Fig. 6b) values for each refactoring recom- 
mendation of GOBLIN. The figure-pairs 6c—6d and 6e—6f show the same data 
for mid-sized and large programs, respectively. For (RQ2), we used Strategy 3 
from (RQ2) and varied function w to study different penalties for modifier pub- 
lic. Figure 5 plots the (minimal) values of ASI and DI depending on w(public) 
(from 3 up to 100). Regarding (RQ3), we compare the results of GOBLIN 
to those of state-of-the-art refactoring recommender tools, JDEODORANT [12] 
and CODE-Imp [27]. Refactorings proposed by JDEODORANT have as singleton 
optimization objective to eliminate specific anti-patterns through heuristic refac- 
toring strategies. In particular, JDEODORANT employs ExtractClass [13] to elim- 
inate The Blob (also called GodClass), by separating parts from the controller- 
class into a freshly created class. Thus, each recommendation of JDEODOR- 
ANT subsumes multiple MoveMethod refactorings (into the fresh target class). 
In contrast, CODE-ImP pursues a search-based approach, including a variety of 


—0.004 
Program Version LOC #P #C #M g —0.003 
1: QuickUML 2001 2,667 1 19 175 a 
2: JSciCale 2.1.0 5,437 3 121 563 Z —0.002 
3: JUnit 3.8.2 5,780 11 105 841 £ B 
4: Gantt 1.10.2 21,331 28 256 1,925 & —0.001 
5: Nutch 0.9 21,437 24 273 1,750 ~ min(ASI) 
6: Lucene 1.4.3 25,472 15 276 1,750 0 --- min(DI) 
7: log4j 1.2.17 31,429 35 394 3,240 = 
8: JHotDraw 7.6 31,434 24 312 3,781 i i 
w(public) 
Table 1. Evaluation Gorpus Fig. 5. Minimal ASI and DI values for dif- 


ferent weightings of public 
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Fig. 6. Measurement results 
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refactoring operations and design-quality metrics. For a comparison to GOB- 
LIN, we used the MoveMethod refactoring of CODE-IMP which produces one 
sequence of MoveMethod refactorings per run. Figures 6g and h contain compar- 
isons of ASI and DI values, respectively, for our corpus (excluding QUICKUML 
due to relatively very high variations). For each program, the upper box-plot 
shows the results for GOBLIN and the lower one for JDEODORANT, respectively. 
CODE-Inp only successfully produced results for QUICKUML and JUNIT (10 
runs each) while terminating without any result for the others. 


4.2 Discussion 


Concerning (RQ1), Strategy 3 leads to the best attack-surface impact for 
small programs (under neglectible execution-time overhead), while even slightly 
improving the design impact. Although this clear advantage dissolves for mid- 
sized and large programs, it still contributes to a reasonable trade-off, while 
attack-surface reductions tend to hamper design improvements as expected. Cal- 
culating the Pearson correlation [32] between ASI and DI shows that (1) the 
strategy does not influence the correlation and (2) for small programs, GOB- 
LIN finds refactorings which are beneficial for both attack surface and program 
design. 

Concerning (RQ2), Fig. 5 shows that a higher value for w(public) leads to a 
better attack-surface impact, as attack-surface-critical refactorings are less likely 
to survive throughout generations. The increase in ASI is remarkably steep from 
w(public) = 3 to w(public) = 7, but exhibits slow linear growth for higher values. 
Regarding the design impact, up to w(public) = 10, the best achieved DI also 
grows linearly, but afterwards, no more DI improvements emerge. In higher value 
ranges (>70), DI reaches a threshold, and degrades afterwards. 

Regarding (RQ3), the The Blob elimination strategy of JDEODORANT nec- 
essarily increases attack surfaces, as calls to extracted methods have to access the 
new class, thus necessarily increasing accessibility at least up to default. As also 
shown in Fig. 6g, there are almost no refactorings proposed by JDEODORANT 
with a positive attack-surface impact. Surprisingly, JDEODORANT also achieves 
a less beneficial design impact than GOBLIN, with a strong correlation between 
ASI and DI. Our unfortunately very limited set of observations for CODE-IMP 
shows that, due to the similar search technique, the refactorings found by CODE- 
Imp and GOBLIN are quite similar. Nevertheless, due to the different focus of 
objectives, CODE-Imp tends to increase attack surfaces. Although, the differ- 
ences in metrics definitions forbid any definite conclusions, however, CODE-IMP 
does not achieve any design improvements according to our metrics. 

To summarize, our experimental results demonstrate that attack-surface 
impacts of refactorings clearly deserve more attention in the context of refac- 
toring recommendations, revealing a practically relevant trade-off (or, even con- 
tradiction) between traditional design-improvement efforts and extra-functional 
(particularly, security) aspects. Our experiments further uncover that existing 
tools are mostly unaware of attack-surface impacts of recommended refactorings. 
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5 Related Work 


Automating Design-Flaw Detection and Refactorings. Marinescu pro- 
poses a metric-based design-flaw detection approach similar to Peldszus et al. 
in [33], which is used in our work. However, both works do not deal with elimi- 
nation of detected flaws [21]. In contrast, the DECOR framework also includes 
recommendations for eliminating anti-patterns, whereas, in contrast to our work, 
those recommendations remain rather atomic and local. More related to our 
approach, Fokaefs et al. [12] and Tsantalis et al. [40] consider (semi-)automatic 
refactorings to eliminate anti-patterns like The Blob in the tool JDEODORANT. 
Nevertheless, they focus on optimizing one single objective and do not consider 
multiple, esp. extra-functional, aspects like security metrics as in our approach. 


Multi-objective Search-Based Refactorings. O’Keeffe and Ó Cinnéide use 
search-based refactorings in their tool CODE-ImpP [28] including various stan- 
dard refactoring operations and different quality metrics as objectives [27]. Seng 
et al. consider a search-based setting, where, similar to our approach, compound 
refactoring recommendations comprise atomic MoveMethod operations. Harman 
and Tratt also investigate a Pareto-front of refactoring recommendations includ- 
ing various design objectives [16], and more recently, Ouni et al. conducted a 
large-scale real-world study on multi-objective search-based refactoring recom- 
mendations [30]. However, neither of the approaches investigates the impact of 
refactorings on security-relevant metrics as in our approach. 


Security-Aware Refactorings. Steimann and Thies were the first to pro- 
pose a comprehensive set of accessibility constraints for refactorings covering 
full JAVA [38]. Although their constraints are formally founded, they do not 
consider software metrics to quantify the attack surface impact of (sequences 
of) refactorings. Alshammari et al. propose an extensive catalogue of software 
metrics for evaluating the impact of refactorings on program security of object- 
oriented programs [1]. Similarly, Maruyama and Omori propose a technique [22] 
and tool [23] for checking if a refactoring operation raises security issues. How- 
ever, all these approaches are concerned with security and accessibility con- 
straints of specific refactorings, but they do not investigate those aspects in a 
multi-objective program optimization setting. The problem of measuring attack 
surfaces serving as a metric for evaluating secure object-oriented programming 
policies has been investigated by Zoller and Schmolitzky [41] and Manadhata 
and Wing [20], respectively. Nevertheless, those and similar metrics have not 
yet been utilized as optimization objective for program refactoring. Finally, 
Ghaith and O Cinnéide consider a catalogue of security-relevant metrics to rec- 
ommend refactorings using CODE-IMP, but they also consider security as single 
objective [14]. 


Attack Surface of OO Refactorings 53 


6 Conclusion 


We presented a search-based approach to recommend sequences of refactor- 
ings for object-oriented JAvA-like programs by taking the attack surface as 
additional optimization objective into account. Our model-based methodology, 
implemented in the tool GOBLIN, utilizes the MOMOT framework including 
the genetic algorithm NSGA-III for search-space exploration. Our experimental 
results gained from applying GOBLIN to real-world Java programs provides us 
with detailed insights into the impact of attack-surface metrics on fitness values 
of refactorings and the resulting trade-off with competing design-quality objec- 
tives. As a future work, we plan to incorporate additional domain knowledge 
about critical code parts to further control security-aware refactorings. 
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Abstract. Attack trees (ATs) are a popular formalism for security anal- 
ysis, and numerous variations and tools have been developed around 
them. These were mostly developed independently, and offer little inter- 
operability or ability to combine various AT features. 

We present ATTop, a software bridging tool that enables automated 
analysis of ATs using a model-driven engineering approach. ATTop ful- 
fills two purposes: 1. It facilitates interoperation between several AT 
analysis methodologies and resulting tools (e.g., ATE, ATCalc, ADTool 
2.0), 2. it can perform a comprehensive analysis of attack trees by trans- 
lating them into timed automata and analyzing them using the popular 
model checker UPPAAL, and translating the analysis results back to the 
original ATs. Technically, our approach uses various metamodels to pro- 
vide a unified description of AT variants. Based on these metamodels, 
we perform model transformations that allow to apply various analysis 
methods to an AT and trace the results back to the AT domain. We illus- 
trate our approach on the basis of a case study from the AT literature. 


1 Introduction 


Formal methods are often employed to support software engineers in particularly 
complex tasks: model-based testing, type checking and extended static checking 
are typical examples that help in developing better software faster. This paper is 
about the reverse direction: showing how software engineering can assist formal 
methods in developing complex analysis tools. 

More specifically, we reap the benefits of model-driven engineering (MDE) 
to design and build a tool for analyzing attack trees (ATs). ATs [25,31] are 
a popular formalism for security analysis, allowing convenient modeling and 
analysis of complex attack scenarios. ATs have become part of various system 
engineering frameworks, such as UMLsec [16] and SysMLsec [27]. 

Attack trees come in a large number of variations, employing different secu- 
rity attributes (e.g., attack time, costs, resources, etc.) as well as modeling con- 
structs (e.g., sequential vs. parallel execution of scenarios). Each of these vari- 
ations comes with its own tooling; examples include ADTool [12], ATCalc [2], 
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and Attack Tree Evaluator [5]. This “jungle of attack trees” seriously hampers 
the applicability of ATs, since it is impossible or very difficult to combine dif- 
ferent features and tooling. This paper addresses these challenges and presents 
ATTop!, a software tool that overarches existing tooling in the AT domain. 

In particular, the main features of ATTop are (see Fig. 1): 


1. A unified input format that encompasses the known AT features. We have 
collected these features in one comprehensive metamodel. Following MDE 
best practices, this metamodel is extensible to easily accommodate future 
needs. 

2. Systematic model transformations. Many AT analysis methods are based on 

converting the AT into a mathematical model that can be analyzed with exist- 
ing formal techniques, such as timed automata [11,23], Bayesian networks 
[13], Petri nets [8], etc. An important contribution of our work is to make 
these translations more systematic, and therefore more extensible, maintain- 
able, reusable, and less error-prone. 
To do so, we again refer to the concepts of MDE and deploy model transfor- 
mations. We deploy two categories here: so-called horizontal transformations 
achieve interoperability between existing tools. Vertical transformations inter- 
pret a model via a set of semantic rules to produce a mathematical model to 
be analyzed with formal methods. 

3. Bringing the results back to the original domain. When a mathematical model 
is analyzed, the analysis result is computed in terms of the mathematical 
model, and not in terms of the original AT. For example, if AT analysis is 
done via model checking, a trace in the underlying model (i.e., transition 
system) can be produced to show that, say, the cheapest attack costs $100. 
What security practitioners need, however, is a path or attack vector in the 
original AT. This interpretation in terms of the original model is achieved by 
a vertical model transformation in the inverse direction, from the results as 
obtained in the analysis model back into the AT domain. 


These features make ATTop a software bridging tool, acting as a bridge 
between existing AT languages, and between ATs and formal languages. 


Our Contributions. The contributions of this paper include: 


— a full-fledged tool based on MDE, which allows for high maintainability and 
extensibility; 

—a unified input format, enabling interoperability between different AT 
dialects; 

— systematic use of model transformations; which increases reusability while 
reducing error likelihood; 

— a complete cycle from AT to formal model and back, allowing domain experts 
to profit from formal methods without requiring specific knowledge. 


Overview of Our Approach. Figure1 depicts the general workflow of our 
approach. It shows how ATTop acts as a bridge between different languages and 
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formalisms. In particular, thanks to horizontal transformations, ATTop makes it 
possible to use ATs described in different formats, both as an input to other tools 
and as an input to ATTop itself. In the latter case, vertical transformations are 
used in order to deal with UPPAAL as a back-end tool without exposing AT'Top’s 
users to the formal language of timed automata. 


Property of interest (e.g. Attack 
cost optimal attack vector) vector in AT 

| ” J Vertical 
ATE S Binary AT Horizontal | Transformation 

1 ransformation | 
ATCalc < AT in Galileo Horizontal ATT | Trace 
format ransformation op 
Horizontal | 


ADTool 2.0 <— AT specified by ransformation ; 


adtree.xsd Vertical = UPPAAL tool 


Transformation Transformation 


Timed UPPAAL | 


automata query 


Fig. 1. Overview of our approach, showing the contributions of the paper in the gray 
rectangle. Here ATE, ATCalc, ADTool 2.0 are different attack tree analysis tools, each 
with its own input format. ATTop allows these tools to be interoperable (horizontal 
model transformations, see Sect. 4.1). ATTop also provides a much more comprehensive 
AT analysis by automatic translation of attack trees into timed automata and using 
UPPAAL as the back-end analysis tool (vertical transformations, see Sect. 4.2). 


Related Work. A large number of AT analysis frameworks have been devel- 
oped, based on lattice theory [18], timed automata [11,21,23], I/O-IMCs [3,22], 
Bayesian networks [13], Petri nets [8], stochastic games [4,15], etc. We refer 
to [20] for an overview of AT formalisms. Surprisingly, little effort has been 
made to provide a security practitioner with a generic tool that integrates the 
benefits of all these analysis tools. 

The use of model transformations with UPPAAL was explored in [29] for a 
range of different formalisms; the UPPAAL metamodel that was presented there 
is the one we use in ATTop. A related approach for fault trees was proposed in 
[28]. In [14], the authors manually translate UML sequence diagrams into timed 
automata models to analyze timeliness properties of embedded systems. In [1], 
the OpenMADS tool is proposed that takes the input of SysML diagrams and 
UML/MARTE annotations and automatically translates these into determin- 
istic and stochastic Petri nets (DSPNs); however, no model-driven engineering 
technique was applied. 


Organization of the Paper. In Sect. 2, we describe the background. Section 3 
presents the metamodels we use in ATTop, while the model transformations are 
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described in Sect. 4. Section 5 describes the features of ATTop, and in Sect. 6 we 
show the results of our case study using ATTop. Finally, we conclude the paper 
in Sect. 7. 


2 Background 


2.1 Attack Trees in the Security Domain 


Modern enterprises are ever growing complex socio-technical systems comprised 
of multiple actors, physical infrastructures, and IT systems. Adversaries can 
take advantage of this complexity, by exploiting multiple security vulnerabilities 
simultaneously. Risk managers, therefore, need to predict possible attack vec- 
tors, in order to combat them. For this purpose, attack trees are a widely-used 
formalism to identify, model, and quantify complex attack scenarios. 

Attack trees (ATs) were popularized by Schneier through his seminal paper in 
[31] and were later formalized by Mauw in [25]. ATs show how different attack 
steps combine into a multi-stage attack scenario leading to a security breach. 
Due to the intuitive representation of attack scenarios, this formalism has been 
used in both academia and industry to model practical case studies such as 
ATMs [10], SCADA communication systems [7], etc. Furthermore, the attack 
tree formalism has also been advocated in the Security Quality Requirements 
Engineering (SQUARE) [26] methodology for security requirements. 


Example 1. Figure2 shows an example AT (adapted from [36]) modeling the 
compromise of an Internet of Things (IoT) device. 

At the top of the tree is the event compromise_IoT_device, which is refined 
using gates until we reach the atomic steps where no further refinement is 
desired (the leaves of the tree). The top gate in Fig.2 is a SAND (sequential 
AND)-gate denoting that, in order for the attack to be successful, the chil- 
dren of this gate must be executed sequentially from left to right. In the exam- 
ple, the attacker first needs to successfully perform access_home_network, then 
exploit_software_vulnerability_in_IoT_device, and then run_malicious_ 
script. The AND-gate at access_home network represents that both 
gain_access_to_private_networks and get_credentials must be performed, 
but these can be performed in any order, possibly in parallel. Similarly, 
the OR gate at gain_access_to_private_networks denotes that its children 
access_LAN and access_WLAN can be attempted in parallel, but only one needs 
to succeed for a successful attack. 


Traditionally, each leaf of an attack tree is decorated with a single attribute, 
e.g., the probability of successfully executing the step, or the cost incurred when 
taking this step. The attributes are then combined in the analysis to obtain 
metrics, such as the probability or required cost of a successful attack [19]. 

Over the years, the AT formalism has been enriched both structurally (e.g., 
adding more logical gates, countermeasures, ordering relationships; see [20] for 
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compromise_IoT_device 
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access_home_network exploit_software_vulnerability_in_IoT_device run_malicous_script 
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Fig. 2. Attack tree modeling the compromise of an IoT device. Leaves are equipped 
with the cost and time required to execute the corresponding step. The parts of the tree 
attacked in the cheapest successful attack are indicated by a darker color, with start 
and end times for the steps in this cheapest attack denoted in red (times correspond 
to the scenario in Fig. 11). (Color figure online) 


an overview) and analytically (e.g., multi-attribute analysis, time- and cost- 
optimal analysis). This has resulted in a large number of tools (ADTool 2.0 [12], 
ATCalc [5], ATE [2], etc.), each with their own analysis technique. 

Such a wide range of tools can be useful for a security practitioner to perform 
different kinds of analyses of attack trees. However, this requires preparing the 
AT for each tool, as each one has its own input format. To overcome the difficulty 
of orchestrating all these different tools, we propose one tool—AT'Top—to allow 
specification of ATs combining features of multiple formalisms and to support 
analysis of such ATs by different tools without duplicating it for each tool. 


2.2 Model-Driven Engineering 


Model-driven engineering (MDE) is a software engineering methodology that 
treats models not only as documentation, but also as first-class citizens, to 
be directly used in the engineering processes [32]. In MDE, a metamodel (also 
referred to as a domain-specific language, DSL) is specified as a model at a more 
abstract level to serve as a language for models [33]. A metamodel captures 
the concepts of a particular domain with the permitted structure and behav- 
ior, to which models must adhere. Typically, metamodels are specified in class 
diagram-like structures. 

MDE provides interoperability between domains (and tools and technologies 
in these domains) via model transformations. The concept of model transfor- 
mation is shown in Fig. 3. Model transformations map the elements of a source 
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Transformation 


Source Metamodel [4 mapsthe | Definition [maps to the”) Target Metamodel 
elements of elements of 
A is an executes isan A 
H instance of instance of | 


input P output 
Model m Transformation Model 


Fig. 3. The concept of model transformation 


metamodel to the elements of a target metamodel. This mapping is described as 
a transformation definition, using a language specifically designed for this pur- 
pose. The transformation engine executes the transformation definition on the 
input model and generates an output model. 

Adaptation of MDE provides various benefits [30,34,37], specifically: 


1. Empowering domain experts with abstraction: With the introduction of meta- 
models and related tooling, domain experts can focus on modeling in the 
domain; while the technical problems below the modeling level, such as low- 
level implementation details are abstracted away from the domain experts. 

2. Higher level of reusability: The models, metamodels and the tools based on 
them are high-level artifacts that can be reused by many projects targeting 
similar domains. Such reuse increases productivity and quality of the final 
product since the reused units are maintained and improved continuously. 

3. Interoperability: There can be various tools and technologies used in a domain, 
each having its own I/O formats. Model transformations provide interoper- 
ability between these tools and technologies. 


There are a number of tools available for realizing MDE. In this paper, we 
have used the Eclipse Modeling Framework (EMF) [35], which is a state-of-the- 
art tool developed to this aim. EMF provides the Ecore format for defining the 
metamodels and has many plug-ins to support the various functionalities related 
to MDE. The model transformations we present in this paper were implemented 
using the Epsilon Transformation Language (ETL) [17], which is one of the 
domain-specific languages provided by the Epsilon framework. We have chosen 
ETL since it is an easy-to-use language and allows users to inherit, import and 
reuse other Epsilon modules, which increases reusability. We use Java to select 
and execute the ETL transformations. 


3 Metamodels for Attack Tree Analysis 


ATTop uses three different metamodels to represent the attack tree domain con- 
cepts, all defined in the Ecore format. These are shown in Figs. 4, 5 and 6, in a nota- 
tion similar to that of UML class diagrams. They show the domain classes and edges 
representing associations between classes. Edges denote references (—), contain- 
ment (—), or supertype (—2) relations. Multiplicities are denoted between square 
brackets (e.g., [0..*] for unrestricted multiplicity). 
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1. The AT metamodel (ATMM), unifies several extensions of the attack tree 
formalism including traditional attack trees [25,31], attack-defense trees [18], 
defense trees [6], etc. It consists of two parts: the Structure metamodel and 
the Values metamodel. Below we describe the most important design choices 
that led to the ATMM: 

— The ATMM represents the core, generic concepts of ATs, resulting in a 
minimal (and thus clean) metamodel that a domain expert can easily 
read, understand and use to create models. 

— The ATMM provides a lot of flexibility in specifying the relevant concepts 
by using string names and generic values. Concepts such as the Connector 
and the Edge are specified as abstract entities with a set of concrete 
instances. Therefore, new connectors and edges can easily be added to the 
metamodel without breaking existing model instances. The metamodel is 
designed to have good support for model operations, such as traversal of 
the AT models. From a node, any other node can be reached directly or 
indirectly following references. 

— The ATMM node and tree attributes offer convenient and generic meth- 
ods for supporting the results of analysis tools. This allows us to translate 
results from a formal tool back into the AT domain and associate them 
to the original AT model (see Sect. 4.4). 

2. The query metamodel formalizes the security queries to be analyzed over 
attack trees. We support both qualitative queries (i.e., properties such as 
feasibility of attack) and quantitative queries (i.e., security metrics such as 
probability of successful attack, cheapest attack, etc.). 

3. The scenario metamodel represents attack scenarios (a.k.a. attack vectors) 
consisting of the steps leading to, e.g., the cheapest, fastest, or most damaging 
security breaches. 


Below we discuss these metamodels in more detail. 


1. AT Metamodel (ATMM). The ATMM metamodel is a combination of 
two separate metamodels, one representing the attack tree structure (Structure 
metamodel, Fig.4 left) and the other representing the attack tree attributes 
(Values metamodel, Fig. 4 right). This separation allows us to consider different 
attack scenarios modeled via the same attack tree, but decorated with different 
attributes. For example, it is easy to define attribute values based on the attacker 
type: script kiddie, malicious insider, etc. may be all be interested in the same 
asset, but each of them possesses different access privileges and is equipped with 
different resources. 


Structure Metamodel. The structure model, depicted in Fig. 4 on the left, repre- 
sents the structure of the attack tree. Its main class AttackTree contains a set of 
one or more Nodes, as indicated by the containment arrow between AttackTree 
and Node. One of these nodes is designated as the root of the tree, denoted by 
the root reference. Each Node is equipped with an id, used as a reference during 
transformation processes. Furthermore, each node has a (possibly empty) list of 
its parents and children, which allows to easily traverse the AT. A node may 
have a connector, i.e., a gate such as AND, OR, SAND (sequential-AND), etc. 
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Fig. 4. The ATMM metamodel separated into the structure and values metamodels. 
Some connectors, types, and purposes are omitted for clarity and denoted by ellipses. 


In addition to the structure specified by the metamodel, some constraints 
can be used to ensure that a model is a valid attack tree. For example, the 
tree cannot contain cycles, the nodes must form a connected graph, etc. These 
constraints are separately formulated in the Epsilon Validation Language (EVL 
[17]). An example of such a constraint is shown in Listing 1. 


Values Metamodel. The Values metamodel (Fig.4, right side) describes how 
values are attributed to nodes (arrow from Attribute on the right to Node on the 
left). Each Attribute contains exactly one Value, which can be of various (basic 
or complex) types: For example, RealValue is a type of Value that contains real 
(Double) numbers. A Domain groups all those attributes that have the same 
Purpose. By separating the purpose of attributes from their data type, we can 
use basic data types (integer, boolean, real number) for different purposes: For 
example, a real number (RealType) can be used in a Domain named “Maximum 
Duration”, where the purpose is a TimePurpose with timeType = MAXIMAL. 
A RealType number could also be used in a different Domain, say “Likelihood 
of attack” with the purpose to represent a probability (ProbabilityPurpose, not 
shown in the diagram). Thanks to the flexibility of this construct, the set of 
available domains is easily extensible. 


1 context ATMM!AttackTree { 

2 constraint OneAndOnlyOneChildWithoutParents { 

3 check : ATMM!Node.allInstances.select(n|n.parents.size() == 0).size() = 1 

4 and self.root = ATMM!Node.allInstances.select(n|n.parents.size() == 0).first() 
5 } 

6 } 


Listing 1. Constraint specifying that the root node is the only node in an ATMM AT 
with no parents. 
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2. Query Metamodel. Existing attack tree analysis tools such as ATE, 
ATCalc, ADTool 2.0, etc. support only a limited set of queries, lacking the 
flexibility to customize one’s own security queries. Using the MDE approach, 
we have developed the Query metamodel shown in Fig. 5. This allows a security 
practitioner to ask a wide range of qualitative and quantitative metrics over a 
wide range of attributes such as cost, time, damage, etc. 

Using this metamodel in ATTop, a security practitioner can ask all the secu- 
rity queries available in the aforementioned tools. Furthermore, the metamodel 
offers a more comprehensive set of security queries where users can tailor their 
own security queries. For example, it is possible to ask whether a successful 
attack can be carried out within 10 days and without spending more than $900. 


OptimalQuery 


| ExpectedValueQuery ReachabilityQuery 


ProbabilityQuery 


domain : Domain 
goal : OptimizationGoal 


| domain : Domain 


constraints 


RelationalOperator 
Constraint 


| OptimizationGoal 


GREATER 
SMALLER 
EQUAL 


operator : RelationalOperator 
domain : Domain 
value : Value 


MAXIMUM 
MINIMUM 


Fig. 5. The query metamodel. The types ‘Domain’ and ‘Value’ refer to the classes of 
the ATMM metamodel (Fig. 4). 


The main component of the query metamodel is the element named Query. 
A query can be one of the following: 


— Reachability, i.e., Is it feasible to reach the top node of an attack tree? Sup- 
ported by every tool. 

— Probability, i.e., What is the probability that a successful attack occurs? Sup- 
ported by every tool. 

— ExpectedValue, i.e., What is the expected (average) value of a given quantity 
over all possible attacks? Supported by AT'Top. 

— Optimality, i.e., Which is the attack that is optimal w.r.t. a given attribute 
(e.g., time or cost)? Supported by ATE, ADTool 2.0, ATTop. 


Furthermore, a query can be framed by combining one of the above query types 
with a set of Constraints over the AT attributes. A Constraint is made of a 
RelationalOperator, a Value and its Domain. For example, the constraint “within 
10 days” is expressed with the SMALLER RelationalOperator, a Value of 10, and 
the Domain of “Maximum Duration”. 


Effective Analysis of Attack Trees: A Model-Driven Approach 65 


3. Scenario Metamodel. ATTop is geared to provide different results: some 
of which are numeric, like the probability to execute attack, the maximum cost 
to execute an attack, etc. Other results contain qualitative information such as 
an attack vector, which is a partially ordered set of basic attack steps resulting 
in the compromise of an asset under a given set of constraints (for example, 
incurring minimum cost). In order to properly trace back the qualitative output 
to the original attack tree, we use the Scenario metamodel (see Fig. 6). 

The Scenario metamodel is used to represent attack vectors. In our context, 
we consider an attack vector to be a Schedule where there is only one Executor, 
which we name “Attacker”. The sequence of Tasks appearing in a Scenario are 
then interpreted as the sequence of the attack steps the Attacker needs to carry 
out in order to reach their objective. Each attack step is actually a node of the 
original AT, and is represented as an Executable whose name corresponds to the 
id of the original Node. Timing information contained in each Task describes the 
start (startTime) and end (endTime) time points for each attack step. Note that 
an attack can start but not end before the objective is reached (multiplicity “1” 
for startTime and “0..1” for endTime). 


Executor Schedule | 5 Executable 


name : String executors | executables name : String 


executor executable 
tasks 


startTime [1] 


Tak @———  _ Time 
name : String TTT value : Float 


Fig. 6. The Scenario metamodel from [29]. In the context of ATs, all instances of this 
metamodel will have only one Executor, the Attacker; Executables represent attack steps 
(i.e. Nodes from the AT), while a Scenario is known as an attack vector. 


4 Model Transformations 


ATTop supports horizontal and vertical model transformations. Figure 7 illus- 
trates the difference between these. Horizontal transformations convert one 
model into another that conforms to the same metamodel, e.g., a transformation 
from one AT analysis tool to another (where the models of both tools are repre- 
sented in the ATMM metamodel). Vertical transformations transform a model 
into another that conforms to a different metamodel, e.g., the transformation 
from an AT into a timed automaton. A key feature of ATTop is that it also 
provides vertical transformations in the reverse direction: analysis results (e.g., 
traces produced by UPPAAL) are interpreted in terms of the original attack tree 
model. 
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4.1 Horizontal Transformations: Unifying Dialects of Attack Trees 


One of the goals of applying the model-driven approach is to facilitate interop- 
eration between different tools. To this end, we provide transformations to and 
from the file formats of ADTool 2.0 [12], Attack Tree Evaluator (ATE) [5], and 
ATCalc [2]. 

Due to the different features supported by the various tools, not all input 
formalisms can be converted to any other format preserving all semantics. For 
example, ATCalc performs only timing analysis, while ADTool can also perform 
cost analysis of untimed attack trees. In such cases, the transformations convert 
whatever information is supported by their output format, omitting unsupported 
features. As the ATMM metamodel unifies the features of all the listed tools, 
transformations into this metamodel are lossless. 


Example 2. ATE Transformation. The Attack Tree Evaluator [5] tool can only 
process binary trees. Using a simple transformation, we can transform any 
instance of the ATMM into a binary tree. A simplified version of this trans- 
formation, written in ETL, is given in Listing 2. This transformation is based 
on a recursive method that traverses the tree. For every node with more than 
two children, it nests all but the first child under a new node until no more than 
two children remain. 


4.2 Vertical Transformations: Analyzing ATs via Timed Automata 


Thus far we have described the transformations to and from dedicated tools for 
attack trees. In this section we introduce a vertical transformation which we use 
in ATTop to translate attack trees into the more general-purpose formalism of 
timed automata (TA). Specifically, we provide model transformations to TAs 
that can be analyzed by the UPPAAL tool to obtain the wide range of qualitative 
and quantitative properties supported by the query metamodel. 

Our transformation targets the UPPAAL metamodel described in [29]. It 
transforms each element of the attack tree (i.e., each gate and basic attack step) 


a 
v eo 
3 | i 
E Attack tree metamodel (ATMM) Uppaal timed 
£ | automata metamodel 
= A A 4 

f conforms to i conforms to conforms to! 


d ~ AT specified in mea ie. Timed automata 
{ Model ADTool 2.0 > Model models of AT 


AT specified in 
Galileo format 


Model 


(input tothe pe Transformation XML format ‘Transformation elements 
ATCalc tool) s 
Horizontal transformation Vertical transformation 


Fig. 7. Examples of horizontal and vertical model transformations. 


Effective Analysis of Attack Trees: A Model-Driven Approach 67 


var structure := AttackTree.all. first (); 


1 
2 structure. Root.NodeToBinary(); 
3 
4 operation Node NodeToBinary(){ 
5 if (self . Children. size ()>2){ 
6 var newNode = new Node(); 
7 newNode.Parents.add(self); 
8 structure . Nodes.add(newNode); 
9 
10 var replaceNodes := self.Children.excluding(self. Children. first ()); 
1 newNode.Children := replaceNodes; 
12 self .Children.removeAll(replaceNodes); 
13 self .Children.add(newNode); 
14 } 
15 for(child in self.Children) 
16 child .NodeToBinary(); 
17 
} 


Listing 2. Transformation of an ATMM attack tree to a binary AT 


into a timed automaton. These automata communicate via signals and together 
describe the behavior of the entire tree. For example, Fig.8 shows the timed 
automaton obtained by transforming an attack step with a deterministic time 
to execute of 5 units. 


x= 5 


Depending on the features of the 
model and the desired property to be 
analyzed, the output of the transfor- 
mation can be analyzed by different 
extensions of UPPAAL. For example, 
UPPAAL CORA supports the analysis 
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Fig. 8. Example of a timed automaton 
modeling a basic attack step with a fixed 


of cost-optimal queries, such as “What time to execute of 5 units. 


is the lowest cost an attacker needs to incur in order to complete an attack”, 
while UPPAAL-SMC supports statistical model checking, allowing the analysis of 
models with stochastic times and probabilistic attack steps with queries such as 
“What is the probability that an attacker successfully completes an attack within 
one hour”. The advantages of UPPAAL CORA’s exact results come at the cost 
of state space explosion, which limits the applicability of this approach for larger 
problems. On the other hand, the speed and scalability of the simulation-based 
UPPAAL-SMC are countered by approximated results and the unavailability of 
(counter-)example traces. 


4.3 Query Transformation: From Domain-Specific to Tool-Specific 


ATTop aims to enable the analysis of ATs also by users that are less familiar 
with the underlying tools. One challenge for such a user is that every tool has 
its own method to specify what property of the AT should be computed. 
Section 3 describes our metamodel for expressing a wide range of possible 
queries, and we now transform such queries to a tool-specific format. Many tools 
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support only a single query (e.g., ATE [5] only supports Pareto curves of cost 
vs. probability), in which case no transformation is performed but ATTop only 
allows that single query as input. 

The UPPAAL tool is an example of a tool supporting many different queries. 
After transforming the AT to a timed automaton (cf. Sect.4.2), we transform 
the query into the textual formula supported by UPPAAL. The basic form of 
this formula is determined by the query type (e.g., a ReachabilityQuery will be 
translated as “E<> toplevel.completed”, which asks for the existence of a trace 
that reaches the top level event), while constraints add additional terms limiting 
the permitted behavior of the model. By using an UPPAAL-specific metamodel 
for its query language linked to the TA metamodel, our transformation can easily 
refer to the TA elements that correspond to converted AT elements. 


4.4 Result Transformation: From Tool-Specific to Domain-Specific 


Analyses done with a back-end tool produce results that may only be immedi- 
ately understandable to an expert in that tool. An important feature of ATTop 
to ease its use by non-experts, is that it provides interpretations of these results 
in terms of the original AT. 

For example, given an attack tree whose leaves are annotated with (time- 
dependent) costs, UPPAAL can produce a trace showing the cheapest way to 
reach a security breach (optionally within a specified time bound). This trace 
is given in a textual format, with many details that are irrelevant to a security 
analyst. It is much easier to understand this scenario when shown in terms of 
the attack tree (for example, Fig. 11 is a scenario described by several pages 
of UPPAAL output). This is exactly the purpose of having reverse transforma- 
tions: UPPAAL’s textual traces are automatically parsed by ATTop, generating 
instances of the Trace metamodel described in [29]. To do so, the transformation 
from ATMM to UPPAAL retains enough information to trace identifiers in the 
UPPAAL model back to the elements of the AT. When parsing the trace, ATTop 
extracts only the relevant events (e.g., the starts and ends of attack steps) and 
related information (e.g., time). This information is then stored as an instance 
of the Scenario metamodel described in Sect. 3. 

In the generated Schedule, attack steps are represented as Executables, while 
Tasks indicate the start and finish time of each attack step, thus describing the 
attack vector. Only one Executor is present in any attack vector produced by 
this transformation, and that is the Attacker. An example of such a generated 
schedule can be seen in Fig. 11. 


5 Tool Support 


We have developed the tool ATTop to enable users to easily use the transfor- 
mations described in this paper, without requiring knowledge of the underly- 
ing techniques or formalisms. ATTop automatically selects which transforma- 
tions to apply based on the available inputs and desired outputs. For exam- 
ple, if the user provides an ADTool input and requests an UPPAAL output, 


ATTop will 
first execute the transforma- 
tion from ADTool to the 


automatically 


ATMM, and then the trans- 
formation from ATMM to 
UPPAAL. 
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Users operate the tool 
by specifying input files and 
their corresponding languages, 
and the desired output files 
and languages. ATTop then 
performs a search for the 
shortest sequence of transfor- 
mations achieving the desired 
outputs from the inputs. For 
example, Fig.9 shown the 
tool’s main screen, where the 
user has provided an input 
AT in Galileo format. The 
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Fig. 9. Screenshot of ATTop’s main screen, allowing 
input file selection, query specification, and output 


selection. 


user can now choose between different queries and analysis engines. 


6 Case Study 


As a case study we use the example anno- 1 
tated attack tree given in Fig. 2. We apply 
ATTop to automatically compute several 
qualitative and quantitative security met- 
rics. Specifically, we apply a horizontal 
transformation to convert the model from 
the ATCalc format to that accepted by 


Probability of successful attack 


ADTool 2.0, and a vertical transformation ° 
to analyze the model using UPPAAL. 


We specify the AT in the Galileo for- 
mat as accepted by ATCalc. Analysis with 


ATCalc yields a graph of the probability of 
a successful attack over time, as shown in Fig. 10. Next, we would like to deter- 
mine the minimal cost of a successful attack, which ATCalc cannot provide. 
Therefore, we use AT'Top to transform the AT to the ADTool 2.0 format, and 
use ADTool 2.0 to compute the minimal cost (yielding $270). 

Next, we perform a more comprehensive timing analysis using the vertical 
transformation described in Sect. 4.2. We use ATTop to transform the AT to a 
timed automaton that can be analyzed using the UPPAAL tool. We also transform 
a query (OptimalityQuery asking for minimal time) to the corresponding UPPAAL 
query. Combining these, we obtain a trace for the fastest successful attack, which 
ATTop transforms into a scenario in terms of the AT as described in Sect. 4.3. 


20 30 40 50 
Time (hours) 


Fig. 10. ATCalc plot showing proba- 
bility of successful attack over time 
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sw_vulherability 


find WLAN 
: cious_script 


break_WPA_keys 


get_credentials 


0 120 300 600 660 690 Time (min) 


Fig. 11. Scenario of fastest attack as computed by UPPAAL . The executed steps and 
their start-end times are also shown in Fig. 2. 


The resulting scenario is shown in Fig. 11. Running the whole process, including 
the transformations and the analysis with UPPAAL, took 6.5s on an Intel® 
Core™ i7 CPU 860 at 2.80 GHz running Ubuntu 16.04 LTS. 


7 Conclusions 


We have presented a model-driven approach to the analysis of attack trees and a 
software bridging tool—ATTop—implementing this approach. We support inter- 
operability between different existing analysis tools, as well as our own analysis 
using the popular tool UPPAAL as a back-end engine. 

Formal methods have the advantage of being precise, unambiguous and sys- 
tematic. A lot of effort is spent on their correctness proofs. However, these ben- 
efits are only reaped if the tools supporting formal analysis are also correct. To 
the best of our knowledge, this work is among the first to apply the systematic 
approach of MDE to the development of formal analysis tools. 

Through model-driven engineering, we have developed the attack tree meta- 
model (ATMM) with support for the many extended formalisms of attack trees, 
integrating most of the features of such extensions. This unified metamodel pro- 
vides a common representation of attack trees, allowing easy transformations 
from and to the specific representations of individual tools such as ATCalc [2] 
and ADTool [12]. The metamodels for queries and schedules facilitate a user- 
friendly interface, obtaining relevant questions and presenting results without 
needing expert knowledge of the underlying analysis tool. 

We have presented our approach specifically for attack trees, but we believe it 
can be equally fruitful for different formalisms and tools as well (e.g. PRISM [24], 
STORM [9]) by using different metamodels and model transformations. We thus 
expect our approach to be useful in the development of other tools that bridge 
specialized domains and formal methods. 
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Abstract. Designers of distributed database systems face the choice 
between stronger consistency guarantees and better performance. A num- 
ber of applications only require read atomicity (RA) and prevention of 
lost updates (PLU). Existing distributed database systems that meet 
these requirements also provide additional stronger consistency guaran- 
tees (such as causal consistency), and therefore incur lower performance. 
In this paper we define a new distributed transaction protocol, ROLA, 
that targets applications where only RA and PLU are needed. We for- 
mally model ROLA in Maude. We then perform model checking to ana- 
lyze both the correctness and the performance of ROLA. For correctness, 
we use standard model checking to analyze ROLA’s satisfaction of RA 
and PLU. To analyze performance we: (a) use statistical model checking 
to analyze key performance properties; and (b) compare these perfor- 
mance results with those obtained by analyzing in Maude the well-known 
protocol Walter. Our results show that ROLA outperforms Walter. 


1 Introduction 


Distributed transaction protocols are complex distributed systems whose design 
is quite challenging because: (i) validating correctness is very hard to achieve by 
testing alone; (ii) the high performance requirements needed in many applica- 
tions are hard to measure before implementation; and (iii) there is an unavoidable 
tension between the degree of consistency needed for the intended applications 
and the high performance required of the transaction protocol for such applica- 
tions: balancing well these two requirements is essential. 

In this work, we present our results on how to use formal modeling and 
analysis as early as possible in the design process to arrive at a mature design of a 
new distributed transaction protocol, called ROLA, meeting specific correctness 
and performance requirements before such a protocol is implemented. In this 
way, the above-mentioned design challenges (i)—(iii) can be adequately met. We 
also show how using this formal design approach it is relatively easy to compare 
ROLA with other existing transaction protocols. 
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ROLA in a Nutshell. Different applications require negotiating the consis- 
tency vs. performance trade-offs in different ways. The key issue is the applica- 
tion’s required degree of consistency, and how to meet such requirements with 
high performance. Cerone et al. [4] survey a hierarchy of consistency models for 
distributed transaction protocols including (in increasing order of strength): 


— read atomicity (RA): either all or none of a distributed transaction’s updates 
are visible to another transaction (that is, there are no “fractured reads”); 

— causal consistency (CC): if transaction T> is causally dependent on transaction 
Tı, then if another transaction sees the updates by 75, it must also see the 
updates of T, (e.g., if A posts something on a social media, and C sees B’s 
comment on A’s post, then C must also see A’s original post); 

— parallel snapshot isolation (PSI): like CC but without lost updates; 

— and so on, all the way up to the well-known serializability guarantees. 


A key property of transaction protocols is the prevention of lost updates 
(PLU). The weakest consistency model in [4] satisfying both RA and PLU is PSI. 
However, PSI, and the well-known protocol Walter [20] implementing PSI, also 
guarantee CC. Cerone et al. conjecture that a system guaranteeing RA and PLU 
without guaranteeing CC should be useful, but up to now we are not aware of any 
such protocol. The point of ROLA is exactly to fill this gap: guaranteeing RA and 
PLU, but not CC. Two key questions are then: (a) are there applications needing 
high performance where RA plus PLU provide a sufficient degree of consistency? 
and (b) can a new design meeting RA plus PLU outperform existing designs, 
like Walter, meeting PSI? 

Regarding question (a), an example of a transaction that requires RA and 
PLU but not CC is the “becoming friends” transaction on social media. Bailis 
et al. [3] point out that RA is crucial for this operation: If Edinson and Neymar 
become friends, then Unai should not see a fractured read where Edinson is a 
friend of Neymar, but Neymar is not a friend of Edinson. An implementation of 
“becoming friends” must obviously guarantee PLU: the new friendship between 
Edinson and Neymar should not be lost. Finally, CC could be sacrificed for the 
sake of performance: Assume that Dani is a friend of Neymar. When Edinson 
becomes Neymar’s friend, he sees that Dani is Neymar’s friend, and therefore 
also becomes friend with Dani. The second friendship therefore causally depends 
on the first one. However, it does not seem crucial that others are aware of this 
causality: If Unai sees that Edinson and Dani are friends, then it is not necessary 
that he knows that (this happened because) Edinson and Neymar are friends. 

Regarding question (b), Sect. 6 shows that ROLA clearly outperforms Walter 
in all performance requirements for all read/write transaction rates. 


Maude-Based Formal Modeling and Analysis. In rewriting logic [16], 
distributed systems are specified as rewrite theories. Maude [5] is a high- 
performance language implementing rewriting logic and supporting various 
model checking analyses. To model time and performance issues, ROLA is spec- 
ified in Maude as a probabilistic rewrite theory [1,5]. ROLA’s RA and PLU 
requirements are then analyzed by standard model checking, where we disregard 
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time issues. To estimate ROLA’s performance, and to compare it with that of 
Walter, we have also specified Walter in Maude, and subject the Maude mod- 
els of both ROLA and Walter to statistical model checking analysis using the 
PVESTA [2] tool. 


Main Contributions include: (1) the design, formal modeling, and model 
checking analysis of ROLA, a new transaction protocol having useful applications 
and meeting RA and PLU consistency properties with competitive performance; 
(2) a detailed performance comparison by statistical model checking between 
ROLA and the Walter protocol showing that ROLA outperforms Walter in all 
such comparisons; (3) to the best of our knowledge the first demonstration that, 
by a suitable use of formal methods, a completely new distributed transaction 
protocol can be designed and thoroughly analyzed, as well as be compared with 
other designs, very early on, before its implementation. 


2 Preliminaries 


Read-Atomic Multi-Partition (RAMP) Transactions. To deal with ever- 
increasing amounts of data, large cloud systems partition their data across multi- 
ple data centers. However, guaranteeing strong consistency properties for multi- 
partition transactions leads to high latency. Therefore, trade-offs that combine 
efficiency with weaker transactional guarantees for such transactions are needed. 

In [3], Bailis et al. propose an isolation model, read atomic isolation, and Read 
Atomic Multi-Partition (RAMP) transactions, that together provide efficient 
multi-partition operations that guarantee read atomicity (RA). 

RAMP uses multi-versioning and attaches metadata to each write. Reads use 
this metadata to get the correct version. There are three versions of RAMP; in 
this paper we build on RAMP-Fast. To guarantee that all partitions perform 
a transaction successfully or that none do, RAMP performs two-phase writes 
using the two-phase commit protocol (2PC). In the prepare phase, each time- 
stamped write is sent to its partition, which adds the write to its local database.! 
In the commit phase, each such partition updates an index which contains the 
highest-timestamped committed version of each item stored at the partition. 

RAMP assumes that there is no data replication: a data item is only stored at 
one partition. The timestamps generated by a partition P are unique identifiers 
but are sequentially increasing only with respect to P. A partition has access to 
methods GET_ALL(J: set of items) and PUT_ALL(W : set of (item, value) pairs). 

PUT_ALL uses two-phase commit for each w in W. The first phase initiates 
a prepare operation on the partition storing w.item, and the second phase com- 
pletes the commit if each write partition agrees to commit. In the first phase, the 
client (i.e., the partition executing the transaction) passes a version v : (item, 
value, ts,,md) to the partition, where ts, is a timestamp generated for the 
transaction and md is metadata containing all other items modified in the same 
transaction. Upon receiving this version v, the partition adds it to a set versions. 


1 RAMP does not consider write-write conflicts, so that writes are always prepared 
successfully (which is why RAMP does not prevent lost updates). 
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When a client initiates a GET_ALL operation, then for each i € J the client 
will first request the latest version vector stored on the server for i. It will then 
look at the metadata in the version vector returned by the server, iterating over 
each item in the metadata set. If it finds an item in the metadata that has a 
later timestamp than the ts, in the returned vector, this means the value for i 
is out of date. The client can then request the RA-consistent version of i. 


Rewriting Logic and Maude. In rewriting logic [16] a concurrent system 
is specified a as rewrite theory (X, E U A, R), where (X, E U A) is a member- 
ship equational logic theory [5], with X an algebraic signature declaring sorts, 
subsorts, and function symbols, Æ a set of conditional equations, and A a set 
of equational axioms. It specifies the system’s state space as an algebraic data 
type. R is a set of labeled conditional rewrite rules, specifying the system’s local 
transitions, of the form [l] : t — t’ if cond, where cond is a condition and l isa 
label. Such a rule specifies a transition from an instance of t to the corresponding 
instance of t’, provided the condition holds. 

Maude [5] is a language and tool for specifying, simulating, and model check- 
ing rewrite theories. The distributed state of an object-oriented system is for- 
malized as a multiset of objects and messages. A class C with attributes att; to 
att, of sorts sı to Sn is declared class C | attı : 81, ..., attn : Sn. An object 
of class C is modeled asa term < o : C | attı : v1, ..., attn : Un >, with o its 
object identifier, and where the attributes att, to att, have the current values 
vı to Un, respectively. Upon receiving a message, an object can change its state 
and/or send messages to other objects. For example, the rewrite rule 


al : x, a2: 0’ > 


rl [1] : m(O,z) <0: C 
<0: C|al:x+z, a2: 0’ > m’(0’,x +z) 


| 
=> | 
defines a transition where an incoming message m, with parameters 0 and z, is 
consumed by the target object 0 of class C, the attribute a1 is updated to x + 
z, and an outgoing message m’ (0’,x + z) is generated. 


Statistical Model Checking and PVESTA. Probabilistic distributed sys- 
tems can be modeled as probabilistic rewrite theories [1] with rules of the form 


[]:t(@) — t (T, y) if cond(@) with probability Y :=7(@) 


where the term ¢’ has new variables y disjoint from the variables % in the 
term t. The concrete values of the new variables y in t/(a’, y’) are chosen 
probabilistically according to the probability distribution 7(@ ). 

Statistical model checking [18,21] is an attractive formal approach to ana- 
lyzing (purely) probabilistic systems. Instead of offering a yes/no answer, it can 
verify a property up to a user-specified level of confidence by running Monte- 
Carlo simulations of the system model. We then use PVESTA [2], a paralleliza- 
tion of the tool VESTA [19], to statistically model check purely probabilistic 
systems against properties expressed as QUATEX expressions [1]. The expected 
value of a QUATEX expression is iteratively evaluated w.r.t. two parameters a 
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and ô by sampling, until we obtain a value v so that with (1—a)100% statistical 
confidence, the expected value is in the interval [v — 3, ut 3]. 


3 The ROLA Multi-Partition Transaction Algorithm 


Our new algorithm for distributed multi-partition transactions, ROLA, extends 
RAMP-Fast. RAMP-Fast guarantees RA, but it does not guarantee PLU since 
it allows a write to overwrite conflicting writes: When a partition commits a 
write, it only compares the write’s timestamp tı with the local latest-committed 
timestamp t2, and updates the latest-committed timestamp with tı or t2. If the 
two timestamps are from two conflicting writes, then one of the writes is lost. 

ROLA’s key idea to prevent lost updates is to sequentially order writes on the 
same key from a partition’s perspective by adding to each partition a data struc- 
ture which maps each incoming version to an incremental sequence number. For 
write-only transactions the mapping can always be built; for a read-write transac- 
tion the mapping can only be built if there has not been a mapping built since the 
transaction fetched the value. This can be checked by comparing the last prepared 
version’s timestamp’s mapping on the partition with the fetched version’s times- 
tamp’s mapping. In this way, ROLA prevents lost updates by allowing versions to 
be prepared only if no conflicting prepares occur concurrently. 

More specifically, ROLA adds two partition-side data structures: sqn, denot- 
ing the local sequence counter, and segq|ts|], that maps a timestamp to a local 
sequence number. ROLA also changes the data structure of versions in RAMP 
from a set to a list. ROLA then adds two methods: the coordinator-side? method 
UPDATE( : set of items, OP : set of operations) and the partition-side method 
PREPARE_UPDATE(v : version, tSpre, : timestamp) for read-write transactions. 
Furthermore, ROLA changes two partition-side methods in RAMP: PREPARE, 
besides adding the version to the local store, maps its timestamp to the increased 
local sequence number; and COMMIT marks versions as committed and updates 
an index containing the highest-sequenced-timestamped committed version of 
each item. These two partition-side methods apply to both write-only and read- 
write transactions. ROLA invokes RAMP-Fast’s PUT_ALL, GET_ALL and GET 
methods (see [3,14]) to deal with read-only and write-only transactions. 

ROLA starts a read-write transaction with the UPDATE procedure. It invokes 
RAMP-Fast’s GET_ALL method to retrieve the values of the items the client 
wants to update, as well as their corresponding timestamps. ROLA writes then 
proceed in two phases: a first round of communication places each timestamped 
write on its respective partition. The timestamp of each version obtained previ- 
ously from the GET_ALL call is also packaged in this prepare message. A second 
round of communication marks versions as committed. 

At the partition-side, the partition begins the PREPARE_UPDATE routine by 
retrieving the last version in its versions list with the same item as the received 
version. If such a version is not found, or if the version’s timestamp ts, matches 


? The coordinator, or client, is the partition executing the transaction. 
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Algorithm 1. ROLA 


Server-side Data Structures 

versions: list of versions (item, value, timestamp ts,, metadata md) 
latestCommit|i]: last committed timestamp for item i 

seq|ts]: local sequence number mapped to timestamp ts 

sqn: local sequence counter 


Server-side Methods 
GET same as in RAMP-Fast 


5: procedure PREPARE_UPDATE(v : version, tSprev : timestamp) 
6: latest — last w € versions : w.item = v.item 

T: if latest = NULL or tSprev = latest.ts, then 

8: sqn — sqn + 1; seq|v.tsy] — sqn; versions.add(v) 

9: return ACK 

10: else return latest 


11: procedure PREPARE(v : version) 
12: sqn — sqn + 1; seq[v.ts,] — sqn; versions.add(v) 


13: procedure COMMIT(ts- : timestamp) 


14: lis — {w.item | w € versions A w.tsy = tse} 
15: for i € Is do 
16: if seq|ts-] > seq|latestCommiti]] then latestCommit|i] — tse 


Coordinator-side Methods 
PUT_ALL, GET_ALL same as in RAMP-Fast 


17: procedure UPDATE(J : set of items, OP : set of operations) 


18: ret — GET_ALL(J); tsez — generate new timestamp 

19: parallel-for i € J do 

20: tSprev — ret|i].tsy; v — ret[i]. value 

21: w — (item = i, value = op;(v), tsu = tstz, md = (I — {t})) 

22: p — PREPARE_UPDATE(W,tSprev) 

23: if p = latest then 

24: invoke application logic to, e.g., abort and/or retry the transaction 
25: end parallel-for 

26: parallel-for server s : s contains an item in I do 

27: invoke COMMIT(tsiz) on s 


28: end parallel-for 


the passed-in timestamp tsprey, then the version is deemed prepared. The par- 
tition keeps a record of this locally by incrementing a local sequence counter 
and mapping the received version’s timestamp ts, to the current value of the 
sequence counter. Finally the partition returns an ACK to the client. If tsprey 
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does not match the timestamp of the last version in versions with the same 
item, then this latest timestamp is simply returned to the coordinator. 

If the coordinator receives an ACK from PREPARE_UPDATE, it immediately 
commits the version with the generated timestamp ts;,. If the returned value is 
instead a timestamp, the transaction is aborted. 


4 A Probabilistic Model of ROLA 


This section defines a formal executable probabilistic model of ROLA. The whole 
model is given at https://sites.google.com/site/fasel8submission /. 

As mentioned in Sect. 2, statistical model checking assumes that the system 
is fully probabilistic; that is, has no unquantified nondeterminism. We follow the 
techniques in [6] to obtain such a model. The key idea is that message delays are 
sampled probabilistically from dense/continuous time intervals. The probability 
that two messages will have the same delay is therefore 0. If events only take 
place when a message arrives, then two events will not happen at the same time, 
and therefore unquantified nondeterminism is eliminated. 

We are also interested in correctness analysis of a model that captures all 
possible behaviors from a given initial configuration. We obtain such a nonde- 
terministic untimed model, that can be subjected to standard model checking 
analysis, by just removing all message delays from our probabilistic timed model. 


4.1 Probabilistic Sampling 


Nodes send messages of the form [A, rcur<- msg], where A is the message 
delay, rcur is the recipient, and msg is the message content. When time A has 
elapsed, this message becomes a ripe message {T , revr <- msg}, where T is the 
“current global time” (used for analysis purposes only). 

To sample message delays from different distributions, we use the follow- 
ing functionality provided by Maude: The function random, where random(k) 
returns the k-th pseudo-random number as a number between 0 and 2°? — 1, 
and the built-in constant counter with an (implicit) rewrite rule counter => 
N:Nat. The first time counter is rewritten, it rewrites to 0, the next time it 
rewrites to 1, and so on. Therefore, each time random(counter) rewrites, it 
rewrites to the next random number. Since Maude does not rewrite counter 
when it appears in the condition of a rewrite rule, we encode a probabilistic 
rewrite rule t(@) — t(x, y) if cond(#) with probability Y := n(T) in 
Maude as the rule t(a) — t'(%,sample(m(@#))) if cond(#). The following 
operator sampleLogNormal is used to sample a value from a lognormal distribu- 
tion with mean MEAN and standard deviation SD: 


op sampleLogNormal : Float Float -> [Float] 
eq sampleLogNormal (MEAN,SD) = exp(MEAN + SD * sampleNormal) 


op sampleNormal : -> [Float] . op sampleNormal : Float -> [Float] 
eq sampleNormal = sampleNormal(float(random(counter) / 4294967296) ) 
eq sampleNormal (RAND) = sqrt(- 2.0 * log(RAND)) * cos(2.0 * pi * RAND) . 
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random(counter) /4294967296 rewrites to a different “random” number 
between 0 and 1 each time it is rewritten, and this is used to define the sampling 
function. For example, the message delay rd to a remote site can be sampled 
from a lognormal distribution with mean 3 and standard deviation 2 as follows: 


eq rd = sampleLogNormal (3.0, 2.0) 


4.2 Data Types, Classes, and Messages 


We formalize ROLA in an object-oriented style, where the state consists of a 
number of partition objects, each modeling a partition of the database, and a 
number of messages traveling between the objects. A transaction is formalized as 
an object which resides inside the partition object that executes the transaction. 


Data Types. A version is a timestamped version of a data item (or key) and is 
modeled as a 4-tuple version (key, value, timestamp, metadata). A timestamp 
is modeled as a pair ts (addr, sqn) consisting of a partition’s identifier addr and 
a local sequence number sqn. Metadata are modeled as a set of keys, denoting, 
for each key, the other keys that are written in the same transaction. 

The sort OperationList represents lists of read and write operations as terms 
such as (x := read kl) (y := read k2) write(kl,x+y), where LocalVar 
denotes the “local variable” that stores the value of the key read by the operation, 
and Expression is an expression involving the transaction’s local variables: 


op write : Key Expression -> Operation [ctor] 
op _:=read_ : LocalVar Key -> Operation [ctor] 
pr LIST{Operation} * (sort List{Operation} to OperationList) 


Classes. A transaction is modeled as an object of the following class Txn: 


class Txn | operations : OperationList, readSet : Versions, 
localVars : LocalVars, latest : KeyTimestamps . 


The operations attribute denotes the transaction’s operations. The readSet 
attribute denotes the versions read by the read operations. localVars maps the 
transaction’s local variables to their current values. latest stores the local view 
as a mapping from keys to their respective latest committed timestamps. 

A partition (or site) stores parts of the database, and executes the trans- 
actions for which it is the coordinator/server. A partition is formalized as an 
object instance of the following class Partition: 


class Partition | datastore : Versions, sqn : Nat, 
gotTxns : ObjectList, executing : Object, 
committed : ObjectList, aborted : ObjectList, 
tsSqn : TimestampSqn, latestCommit : KeyTimestamps, 
votes : Vote, voteSites : TxnAddrSet, 


istGetSites : TxnAddrSet, 2ndGetSites : TxnAddrSet, 
commitSites : TxnAddrSet . 
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The datastore attribute represents the partition’s local database as a list of ver- 
sions for each key stored at the partition. The attribute latestCommit maps to 
each key the timestamp of its last committed version. tsSqn maps each version’s 
timestamp to a local sequence number sqn. The attributes gotTxns, executing, 
committed and aborted denote the transaction(s) which are, respectively, wait- 
ing to be executed, currently executing, committed, and aborted. 

The attribute votes stores the votes in the two-phase commit. The remaining 
attributes denote the partitions from which the executing partition is awaiting 
votes, committed acks, first-round get replies, and second-round get replies. 

The following shows an initial state (with some parts replaced by “...’) with 
two partitions, p1 and p2, that are coordinators for, respectively, transactions 
t1, and t2 and t3. p1 stores the data items x and z, and p2 stores y. Transaction 
t1 is the read-only transaction (xl := read x) (yl := read y), transaction 
t2 is a write-only transaction write(y, 3) write(z, 8), while transaction t3 
is a read-write transaction on data item x. The states also include a buffer of 
messages in transit and the global clock value, and a table which assigns to each 
data item the site storing the item. Initially, the value of each item is [0]; the 
version’s timestamp is empty (eptTS), and metadata is an empty set. 


eq init = { 0.0 | nil} 
< tb : Table | table : [sites(x,p1) ;; sites(y, p2) ;; sites(z,p1)] > 
< pi : Partition | 
gotTxns: < t1 : Txn | operations: ((xl :=read x) (yl :=read y)), 
readSet: empty, latest: empty, 
localVars: (xl |-> [0], yl l-> [0]) >, 
datastore: (version(x, [0], eptTS, empty) 
version(z, [0], eptTS, empty)), 


sqn: 1, ... > 
< p2 : Partition | 
gotTxns: < t2 : Txn | operations: (write(y, 3) write(z,8)), ... > 
< t3 : Txn | operations: ((xl := read x) 
write(x, xl plus 1)), ... > 
datastore: version(y, [0], eptTS, empty), ... >. 


Messages. The message prepare (ten, version, sender) sends a version from a 
write-only transaction to its partition, and prepare (tzn, version, ts, sender) 
does the same thing for other transactions, with ts the timestamp of the version 
it read. The partition replies with a message prepare-reply (tan, vote, sender), 
where vote tells whether this partition can commit the transaction. A message 
commit (tan, ts, sender) marks the versions with timestamp ts as committed. 
get (tan, key, ts, sender) asks for the highest-timestamped committed version or 
a missing version for key by timestamp ts, and responsel (tan, version, sender) 
and response2(tzn, version, sender) respond to first /second-round get requests. 
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4.3 Formalizing ROLA’s Behaviors 


This section formalizes the dynamic behaviors of ROLA using rewrite rules, 
referring to the corresponding lines in Algorithm 1. We only show 2 of the 15 
rewrite rules in our model, and refer to the report [14] for further details.’ 


Receiving prepare Messages (lines 5-10). When a partition receives a prepare 
message for a read-write transaction, the partition first determines whether the 
timestamp of the last version (VERSION) in its local version list VS matches 
the incoming timestamp TS’ (which is the timestamp of the version read by 
the transaction). If so, the incoming version is added to the local store, the 
map tsSqn is updated, and a positive reply (true) to the prepare message is 
sent (“return ack” in our pseudo-code); otherwise, a negative reply (false, or 
“return latest” in the pseudo-code) is sent. Depending on whether the sender 
PID’ of the prepare message happens to be PID itself, the reply is equipped 
with a local message delay 1d or a remote message delay rd, both of which are 
sampled probabilistically from distributions with different parameters:* 


crl [receive-prepare-rw] 
{T, PID <- prepare(TID, version(K, V, TS, MD), TS’, PID’)} 
< PID : Partition | datastore: VS, sqn: SQN, tsSqn: TSSQN, AS’ > 
=> 
if VERSION == eptVersion or tstamp(VERSION) == TS’ 
then < PID : Partition | datastore: (VS version(K,V,TS,MD)), sqn: SQN’, 
tsSqn: insert(TS,SQN’,TSSQN), AS’ > 
[if PID == PID’ then ld else rd fi, 
PID’ <- prepare-reply(TID, true, PID)] 
else < PID : Partition | datastore: VS, sqn: SQN, tsSqn: TSSQN, AS’ > 
[if PID == PID’ then ld else rd fi, 
PID’ <- prepare-reply(TID, false, PID)] fi 
if SQN’ := SQN + 1 /\ VERSION := latestPrepared(K,VS) 


Receiving Negative Replies (lines 23-24). When a site receives a prepare-reply 
message with vote false, it aborts the transaction by moving it to the aborted 
list, and removes PID’ from the “vote waiting list” for this transaction: 


rl [receive-prepare-reply-false-executing] 
{T, PID <- prepare-reply(TID, false, PID’)} 
< PID : Partition | executing: < TID: Txn | AS >, aborted: TXNS, 
voteSites: VSTS addrs(TID, (PID’ , PIDS)), AS’ > 
=> 
< PID : Partition | executing: noTxn, 
aborted: (TXNS ;; < TID: Txn | AS>), 
voteSites: VSTS addrs(TID, PIDS), AS’ >. 


3 We do not give variable declarations, but follow the convention that variables are 
written in (all) capital letters. 
4 The variable AS’ denotes the “remaining” attributes in the object. 
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5 Correctness Analysis of ROLA 


In this section we use reachability analysis to analyze whether ROLA guarantees 
read atomicity and prevents lost updates. 
For both correctness and performance analysis, we add to the state an object 


< m : Monitor | log: log > 


which stores crucial information about each transaction. The log is a list of 
records record (tid, issueTime, finish Time, reads, writes, committed), with tid 
the transaction’s ID, issueTime its issue time, finish Time its commit /abort time, 
reads the versions read, writes the versions written, and committed a flag that 
is true if the transaction is committed. 

We modify our model by updating the Monitor when needed. For example, 
when the coordinator has received all committed messages, the monitor records 
the commit time (T) for that transaction, and sets the “committed” flag to true?: 


crl [receive-committed] 
{T, PID <- committed(TID, PID’)} 
< M : Monitor | log: (LOG record(TID, T’, T’’, RS, WS, false) LOG’) > 
< PID : Partition | executing: < TID : Txn | AS >, 
committed: TXNS, commitSites: CMTS, AS’ > 
=> 
if CMTS’ [TID] == empty --- all "committed" received 
then < M : Monitor | log: (LOG record(TID, T’, T,RS,WS, true) LOG’) > 
< PID : Partition | executing: noTxn, commitSites: CMTS’, 
committed: (TXNS ;; < TID : Txn | AS >, AS’ > 
else < M : Monitor | log: (LOG record(TID, T’, T’’, RS, WS, false) LOG’) > 
< PID : Partition | executing: < TID : Txn | AS >, 
committed: TXNS, commitSites: CMTS’, AS’ > fi 
if CMTS’ := remove(TID, PID’, CMTS) 


Since ROLA is terminating if a finite number of transactions are issued, we 
analyze the different (correctness and performance) properties by inspecting this 
monitor object in the final states, when all transactions are finished. 


Read Atomicity. A system guarantees RA if it prevents fractured reads, and 
also prevents transactions from reading uncommitted, aborted, or intermediate 
data [3], where a transaction T} exhibits fractured reads if transaction T; writes 
version £m and yn, Tj reads version £m and version yg, and k < n [3]. 

We analyze this property by searching for a reachable final state (arrow =>!) 
where the property does not hold: 


search [1] initConfig =>! C:Config < M:Address : Monitor | log: LOG:Record > 
such that fracRead(LOG) or abortedRead (LOG) 


5 The additions to the original rule are written in italics. 
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The function fracRead checks whether there are fractured reads in the execution 
log. There is a fractured read if a transaction TID2 reads X and Y, transaction 
TID1 writes X and Y, TID2 reads the version TSX of X written by TID1, and reads 
a version TSY’ of Y written before TSY (TSY’ < TSY). Since the transactions in 
the log are ordered according to start time, TID2 could appear before or after 
TID1 in the log. We spell out the case when TID1 comes before TID2: 


op fracRead : Record -> Bool . 
ceq fracRead(LOG ; 
record(TID1,T1,T1’,RS1, (version(X,VX,TSX,MDX), version(Y,VY,TSY,MDY)),true) ; LOG’ ; 
record(TID2,T2,T2’, (version(X,VX,TSX,MDX), version(Y,VY’,TSY’,MDY’)), WS2,true) ; LOG’’) 
= true if TSY?’ < TSY . 
ceq fracRead(LOG ; record(TID2,...) ; LOG’ ; record(TID1,...) ; LOG’’) = true if TSY’ < TSY. 
eq fracRead(LOG) = false [owise] . 


The function abortedRead checks whether a transaction TID2 reads a version 
TSX that was written by an aborted (flag false) transaction TID1: 


op abortedRead : Record -> Bool . 
eq abortedRead(LOG ; 
record(TID1, T1,T1’, RS1, (version(X,VX,TSX,MDX), VS), false) ; LOG’ ; 
record(TID2, T2, T2’, (version(X,VX,TSX,MDX) , VS), WS2, true) ; LOG’’) true . 
eq abortedRead(LOG ; record(TID2,...) ; LOG’ ; record(TID1,...) ; LOG’’) = true. 
eq abortedRead(LOG) = false [owise] . 


No Lost Updates. We analyze the PLU property by searching for a final state in 
which the monitor shows that an update was lost: 


search [1] initConfig =>! C:Config <M:Address : Monitor | log: LOG:Record > 
such that 1u(LOG) 


The function lu, described in [14], checks whether there are lost updates in LOG. 

We have performed our analysis with 4 different initial states, with up to 8 
transactions, 2 data items and 4 partitions, without finding a violation of RA 
or PLU. We have also model checked the causal consistency (CC) property with 
the same initial states, and found a counterexample showing that ROLA does 
not satisfy CC. (This might imply that our initial states are large enough so 
that violations of RA or PLU could have been found by model checking.) Each 
analysis command took about 30 seconds to execute on a 2.9GHz Intel 4-Core 
i7-3520M CPU with 3.7 GB memory. 


6 Statistical Model Checking of ROLA and Walter 


The weakest consistency model in [4] guaranteeing RA and PLU is PSI, and 
the main system providing PSI is Walter [20]. ROLA must therefore outperform 
Walter to be an attractive design. To quickly check whether ROLA does so, 
we have also modeled Walter—without its data replication features—in Maude 
(see [11] and https://sites.google.com/site/fase18submission/maude-spec), and 
use statistical model checking with PVESTA to compare the performance of 
ROLA and Walter in terms of throughput and average transaction latency. 
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Extracting Performance Measures from Executions. PVESTA estimates 
the expected (average) value of an expression on a run, up to a desired statistical 
confidence. The key to perform statistical model checking is therefore to define a 
measure on runs. Using the monitor in Sect. 5 we can define a number of functions 
on (states with) such a monitor that extract different performance metrics from 
this “system execution log.” 

The function throughput computes the number of committed transactions 
per time unit. committedNumber computes the number of committed transac- 
tions in LOG, and totalRunTime returns the time when all transactions are fin- 
ished (i.e., the largest finishTime in LOG): 


op throughput : Config -> Float [frozen] 
eq throughput(< M : Monitor | log: LOG > REST) 
= committedNumber (LOG) / totalRunTime(LOG) . 


The function avgLatency computes the average transaction latency by divid- 
ing the sum of the latencies of all committed transactions by the number of such 
transactions: 


op avgLatency : Config -> Float [frozen] 
eq avgLatency(< M : Monitor | log: LOG > REST) 
= totalLatency(LOG) / committedNumber (LOG) 


where totalLatency computes the sum of all transaction latencies (time 
between the issue time and the finish time of a committed transaction). 


Generating Initial States. We use an operator init to probabilistically gener- 
ate initial states: init (rtz, wtz, rwtz, part, keys, rops, wops, rwops, distr) gener- 
ates an initial state with rtz read-only transactions, wtx write-only transactions, 
rwtz read-write transactions, part partitions, keys data items, rops operations 
per read-only transaction, wops operations per write-only transaction, rwops 
operations per read-write transactions, and distr the key access distribution 
(the probability that an operation accesses a certain data item). To capture the 
fact that some data items may be accessed more frequently than others, we also 
use Zipfian distributions in our experiments. 


Statistical Model Checking Results. We performed our experiments under 
different configurations, with 200 transactions, 2—4 operations per transaction, 
up to 200 data items and 50 partitions, with lognormal message delay distribu- 
tions, and with uniform and Zipfian data item access distributions. 

The plots in Fig. 1 show the throughput as a function of the percentage of 
read-only transactions, number of partitions, and number of keys (data items), 
sometimes with both uniform and Zipfian distributions. The plots show that 
ROLA outperforms Walter for all parameter combinations. More partitions gives 
ROLA higher throughput (since concurrency increases), as opposed to Walter 
(since Walter has to propagate transactions to more partitions to advance the 
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vector timestamp). We only plot the results under uniform key access distribu- 
tion, which are consistent with the results using Zipfian distributions. 

The plots in Fig. 2 show the average transaction latency as a function of the 
same parameters as the plots for throughput. Again, we see that ROLA out- 
performs Walter in all settings. In particular, this difference is quite large for 
write-heavy workloads; the reason is that Walter incurs more and more overhead 
for providing causality, which requires background propagation to advance the 
vector timestamp. The latency tends to converge under read-heavy workload 
(because reads in both ROLA and Walter can commit locally without certifica- 
tion), but ROLA still has noticeable lower latency than Walter. 
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Fig. 1. Throughput comparison under different workload conditions. 


Computing the probabilities took 6 hours (worst case) on 10 servers, each 
with a 64-bit Intel Quad Core Xeon E5530 CPU with 12GB memory. Each point 
in the plots represents the average of three statistical model checking results. 


7 Related Work 


Maude and PVESTA have been used to model and analyze the correctness and 
performance of a number of distributed data stores: the Cassandra key-value 
store [12,15], different versions of RAMP [10,13], and Google’s Megastore [7,8]. 
In contrast to these papers, our paper uses formal methods to develop and 
validate an entirely new design, ROLA, for a new consistency model. 
Concerning formal methods for distributed data stores, engineers at Amazon 
have used TLA+ and its model checker TLC to model and analyze the correct- 
ness of key parts of Amazon’s celebrated cloud computing infrastructure [17]. 
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In contrast to our work, they only use formal methods for correctness analysis; 
indeed, one of their complaints is that they cannot use their formal method for 
performance estimation. The designers of the TAPIR transaction protocol for 
distributed storage systems have also specified and model checked correctness 
(but not performance) properties of their design using TLA+ [22]. 
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Fig. 2. Average latency comparison across varying workload conditions. 


8 Conclusions 


We have presented the formal design and analysis of ROLA, a distributed trans- 
action protocol that supports a new consistency model not present in the survey 
by Cerone et al. [4]. Using formal modeling and both standard and statistical 
model checking analyses we have: (i) validated ROLA’s RA and PLU consis- 
tency requirements; and (ii) analyzed its performance requirements, showing 
that ROLA outperforms Walter in all performance measures. 

This work has shown, to the best of our knowledge for the first time, that the 
design and validation of a new distributed transaction protocol can be achieved 
relatively quickly before its implementation by the use of formal methods. Our 
next planned step is to implement ROLA, evaluate it experimentally, and com- 
pare the experimental results with the formal analysis ones. In previous work 
on existing systems such as Cassandra [9] and RAMP [3], the performance esti- 
mates obtained by formal analysis and those obtained by experimenting with 
the real system were basically in agreement with each other [10,12]. This con- 
firmed the useful predictive power of the formal analyses. Our future research 
will investigate the existence of a similar agreement for ROLA. 
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Abstract. A formal semantics is introduced for a Process Network 
model, which combines streaming and reactive control processing with 
task parallelism properties suitable to exploit multi-cores. Applications 
that react to environment stimuli are implemented by communicating 
sporadic and periodic tasks, programmed independently from an exe- 
cution platform. Two functionally equivalent semantics are defined, one 
for sequential execution and one real-time. The former ensures functional 
determinism by implying precedence constraints between jobs (task exe- 
cutions), hence, the program outputs are independent from the task 
scheduling. The latter specifies concurrent execution on a real-time plat- 
form, guaranteeing all model’s constraints; it has been implemented in 
an executable formal specification language. The model’s implementation 
runs on multi-core embedded systems, and supports integration of run- 
time managers for shared HW/SW resources (e.g. for controlling QoS, 
resource interference or power consumption). Finally, a model transfor- 
mation approach has been developed, which allowed to port and stat- 
ically schedule a real spacecraft on-board application on an industrial 
multi-core platform. 
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1 Introduction 


The proliferation of multi-cores in timing-critical embedded systems requires a 
programming paradigm that addresses the challenge of ensuring predictable tim- 
ing. Two prominent paradigms and a variety of associated languages are widely 
used today. For streaming signal processing, synchronous dataflow languages [18] 
allow writing programs in the form of directed graphs with nodes for their func- 
tions and arcs for the data flows between functions. Such programs can exploit 
concurrency when they are deployed to multi-cores [15], while their functions 
can be statically scheduled [17] to ensure a predictable timing behavior. 

On the other hand, the reactive-control synchronous languages [12] are used 
for reactive systems (e.g., flight control systems) expected to react to stimuli 
from the environment within strict time bounds. The synchronicity abstraction 
eliminates the non-determinism from the interleaving of concurrent behaviors. 

The synchronous languages lack appropriate concepts for task parallelism 
and timing-predictable scheduling on multiprocessors, whereas the streaming 
models do not support reactive behavior. The Fired Priority Process Network 
(FPPN) model of computation has been proposed as a trade-off between stream- 
ing and reactive control processing, for task parallel programs. In FPPNs, task 
invocations depend on a combination of periodic data availability (similar to 
streaming models) and sporadic control events. Static scheduling methods for 
FPPNs [20] have demonstrated a predictable timing on multi-cores. A first imple- 
mentation of the model [22] in an executable formal specification language called 
BIP (Behavior, Interaction, Priority) exists, more specifically in its real-time 
dialect [3] extended to tasks [10]. In [21], the FPPN scheduling was studied by 
taking into account resource interference; an approach for incrementally plug- 
ging online schedulers for HW/SW resource sharing (e.g., for QoS management) 
was proposed. 

This article presents the first comprehensive FPPN semantics definition, at 
two levels: semantics for sequential execution, which ensures functional deter- 
minism, and a real-time semantics for concurrent task execution while adhering 
to the constraints of the former semantics. Our definition is related to a new 
model transformation framework, which enables programming at a high level by 
embedding FPPNs into the architecture description, and allows an incremental 
refinement in terms of task interactions and scheduling!. Our approach is demon- 
strated with a real spacecraft on-board application ported onto the European 
Space Agency’s quad-core Next Generation Microprocessor (NGMP). 


2 Related Work 


Design frameworks for embedded applications, like Ptolemy II [6] and 
PeaCE [11], allow designing systems through refining high-level models. They 
are based on various models of computation (MoC), but we focus mainly on 
those that support task scheduling with timing constraints. Dataflow MoCs that 


1 The framework is online at [2]. 
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stem from the Kahn Process Networks [16] have been adapted for the timing 
constraints of signal processing applications and design frameworks like Comp- 
SoC [13] have been introduced; these MoCs do not support reactive behavior and 
sporadic tasks as in the FPPN MoC that can be seen as an extension in that 
direction. DOL Critical [10] ensures predictable timing, but its functional behav- 
ior depends on scheduling. Another timing-aware reactive MoC that does not 
guarantee functional determinism is the DPML [4]. The Prelude design frame- 
work [5] specifies applications in a synchronous reactive MoC, but due to its 
expressive power it is hard to derive scheduling analyses, unless restricting its 
semantics. Last but not the least, though the reactive process networks (RPN) [8] 
do not support scheduling with timing constraints, they lay an important foun- 
dation for combining the streaming and reactive control behaviors. In the FPPN 
semantics we reuse an important principle of RPN semantics, namely, perform- 
ing the mazimal execution run of a dataflow network in response to a control 
event. 


3 A PN Model for Streaming and Reactive Control 


An FPPN model is composed of Processes, Data Channels and Event Generators. 
A Process represents a software subrou- 


tine that operates with internal variables 
and input/output channels connected to it 
through ports. The functional code of the 
application is defined in processes, whereas 
the necessary middleware elements of the 
FPPN are channels, event generators, and 
functional priorities, which define a relation 
between the processes to ensure deterministic 
execution. 

An example process is shown in Fig. 1. 
This process performs a check on the internal 
variables, if the check succeeds then it reads 
from the input channel, and, if the value read 
is valid (refer to the channel definition below) 
its square is computed. The write operation 
on an output channel is then performed. A 
call to the process subroutine is referred to as 
a job. Like the real-time jobs, the subroutine 
should have a bounded execution time sub- 
ject to WCET (worst-case execution time) 
analysis. 


struct SQ_Inititialize(){ 
SQ_index = 0; 
SQ_length = 200; 


} 


void SQ_PeriodicJob() { 
float x, y; 
bool x_valid; 
if (SQ_index < SQ_length) { 
XIF_Read(&x, &x_valid); 
if(x_valid == true) { 
y=x*x; 
/_ valid = true; 
YIF_Write(&y); 
} 
} 
SQ_index++; 


} 


Fig. 1. Example code for “Square” 
process 


An FPPN is defined by two directed graphs. The first is a (possibly cyclic) 


graph (P,C), whose nodes P are processes and edges C are channels for pairs 
of communicating processes with a dataflow direction, i.e., from the writer to 
the reader (there are also external channels interacting with the environment). 
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Fig. 2. Example Fixed Priority Process Network 


A channel is denoted by a c € C or a pair (pi, p2) of writer and reader. For pı the 
channel is said to be an output and for pz an input. The second graph (P, FP) 
is the functional priority directed acyclic graph (DAG) defining a functional 
priority relation between processes. For any two communicating processes we 
require, 


(p1; p2) €C => (p1, p2) € FP V (p2, p1) E FP 


i.e., a functional priority either follows the direction of dataflow or the opposite. 
Given a (p1, p2) € FP, pı is said to have a higher priority than pə. 

The FPPN in Fig. 2, represents an imaginary data processing application, 
where the “X” sporadic process generates values, “Square” calculates the square 
of the received value and the “Y” periodic process serves as sink for the squared 
value. A sporadic event (command from the environment) invokes “X”, which is 
annotated by its minimal inter-arrival time. The periodic processes are annotated 
by their periods. The two types of non-blocking channels are also illustrated. The 
FIFO (or mailbox) has a semantics of a queue. The blackboard remembers the 
last written value that can be read multiple times. The arc depicted above the 
channels indicates the functional priority relation FP. Additionally, the external 
input/output channels are shown. In this example, the dataflow in the channels 
go in the opposite direction of the functional priority order. Note that, by analogy 
to the scheduling priorities, a convenient method to define priority is to assign 
a unique priority index to every process, the smaller the index the higher the 
priority. This method is demonstrated in Fig. 2. In this case the minimal required 
FP relation would be defined by joining each pair of communicating processes 
by an arc going from the higher-priority process to the lower-priority one. 

Let us denote by Var the set of all variables. For a variable x or an ordered 
set (vector) X of variables we denote by D(x) (resp. D(X)) its domain (or vector 
of domains), t.e., the set(s) of values that the variable(s) may take. Valuations of 
variables X are shown as X°, X!..., or simply as X, dropping the superscript. 
Each variable is assumed to have a unique initial valuation. From the software 
point of view, this means that all variables are initialized by a default value. 

Var includes all process state variables X, and the channel state variables 
ye. The current valuation of a state variable is often referred to simply as state. 
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For a variable of channel c, an alphabet Xe and a type C'T, are defined; a channel 
type consists of write ‘operations’ (W,) and read ‘operations’ (Re) defined as 
functions specifying the variable evolution. Function We : D(c) x Xe — D(c) 
defines the update after writing a symbol s € Xe to the channel, whereas Re : 
D(c) > D(c) x Xe maps the channel state to a pair (Rc, Rec2), where Rc; is the 
new channel state and Rcg is the symbol that is read from the channel. For a 
FIFO channel, its state ye is a (initially empty) string and the write operation 
left-concatenates symbol s to the string: We(Ye, s) = soye. For the same channel, 
Re(Ye o 8) = (Ye, s), i.e., we read and remove the last symbol from the string. 
The write and read functions are defined for each possible channel state, thus 
rendering the channels non-blocking. This is implemented by including L in 
the alphabet, in order to define the read operation when the channel does not 
contain any ‘meaningful’ data. Thus, reading from an empty FIFO is defined 
by: R-(e) = (e, L), where e denotes an empty string. For blackboard channel, its 
state is a (initially empty) string that contains at most one symbol — the last 
symbol written to the channel: We(Ye, s) = 8, Re(¥c) = (Ye: Ye), Rele) = (€, L). 

An external channel’s state is an infinite sequence of samples, i.e., variables 
c{1], c[2], c[3],... with the same domain. For a sample c[k], k is the sample index. 
Though the sequence is infinite, no infinite memory is required, because each 
sample can be accessed (as will be shown) within a limited time interval. If c is an 
external output, the channel type defines the sample write operation in the form 
Wi: D'(c) x Ny x Xe — D’(c), where D’(c) is the sample domain, the second 
argument is the sample index and the result is the new sample value. For an 
external input, we have the sample read operation Re : D'(c) x N4 > D’(c) x Xe. 
The set of outputs is denoted by O and the set of inputs by J. 

The program expressions involve variables. Let us call Act the set of all 
possible actions that represent operations on variables. An assignment is an 
action written as Y := f(X). For the channels, two types of actions are defined, 
z!c for writing a variable x, and x?c for reading from the channel, where D(x) = 
Xe. For external channels, we have 2!j,j)¢c, cE O and y?j,j¢, cE J, where [k] is 
the sample index. Actions are defined by a function Effect : Act x D( Var) > 
D( Var), which for every action a states how the new values of all variables 
are calculated from their previous values. The actions are assumed to have zero 
delay. The physical time is modeled by a special action for waiting until time 
stamp T, w(T). 

An execution trace a € Act™ is a sequence of actions, e.g., 


a=w(0), 2h, c= x”, x!cı,w(100), y?e1, Or! py 


The time stamps in the execution are non-decreasing, and denote the time 
until the next time stamp, at which the following actions occur. In the example, 
at time 0 we read sample [1] from J, and we compute its square. Then we write 
to channel cı. At time 100, we read from cı and write the sample [2] to O1. 

A process models a subroutine with a set of locations (code line numbers), 
variables (data) and operators that define a guard on variables (‘if’ condition), 
the action (operator body) and the transfer of control to the next location. 
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Definition 1 (Process). Each process p is associated with a deterministic 
transition system (ee Ly, Xp, Ae Tp, Op, Ap, Tp), with Lp a set of locations, 
ly? € Ly an initial location, and X, the set of state variables with initial val- 
ues Xp°. Zp, Op are (internal and external) input/output channels. Ap is a set 
actions with variable assignments for Xp, reads from Tp, and writes to Op. Tp 
is transition relation Tp : Lp X Gp X Ap x Lp, where Gp is the set of predicates 
(guarding conditions) defined on the variables from Xp. 


One execution step (l1, X1, 9!) & (l2, X?,7?) for the valuations X!, X? of 
variables in X, and the valuations y',7* of channels in T, U Op, implies that 
there is transition (41, g, a, l2) € Tp, such that Xt satisfies guarding condition g 
(i.e., g( X+) = True) and (X?,7) = Effect(a,(X',7’)). 

Definition 1 prescribes a deterministic transition system: for each location £1 
the guarding conditions enable for each possible valuation X’ a single execution 
step. 


Definition 2 (Process job execution). A job execution (X',y!) “>, 
(X?, 7) is a non-empty sequence of process p execution steps starting and ending 
in p’s initial location lo, without intermediate occurrences of €°: 


Ce er a (41,X1,71) seed msn C2): for n > 1,6; # 2 


From a software point of view, a job execution is seen as a subroutine run 
from a caller location that returns control back to the caller. We assume that at 
k-th job execution, external channels 7,, Op are read/written at sample index [k]. 

In an FPPN, there is a one-to-one mapping between every process p and 
the respective event generator e that defines the constraints of interaction with 
the environment. Every e is associated with (possibly empty) subsets Te, Oe of 
the external input/output (I/O) channels. Those are the external channels that 
the process p can access: I, C Zp, Oe C Op. The I/O sets of different event 
generators are disjoint, so different processes cannot share external channels. 

Every e defines the set of possible sequences of time stamps Tk for the ‘event’ 
of k-th invocation of process p and a relative deadline de E€ Q4}. The intervals 
[Tk, Tk + de] determine when the k-th job execution can occur. This timing con- 
straint has two important reasons. First, if the subsets Je or Oe are not empty 
then these intervals should indicate the timing windows when the environment 
opens the k-th sample in the external I/O channels for read or write access at the 
k-th job execution. Secondly, Tk defines the order in which the k-th job should 
execute, the earlier it is invoked the earlier it should execute. Concerning the Tk 
sequences, two event generator types are considered, namely multi-periodic and 
sporadic. Both are parameterized by a burst size me and a period T. Bursts 
of me periodic events occur at 0, Te, 27, etc. For sporadic events, at most 
Me events can occur in any half-closed interval of length Te. In the sequel we 
associate the attributes of an event generator with the corresponding process, 
e.g., Tp and dp. 
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Definition 3 (FPPN). An FPPN is a tuple PN = (P,C,FP,ép,Ie, Oc, 
de, Xe, CT), where P is a set of processes and C C P x P is a set of inter- 
nal channels, with (P,C) defining a (possibly cyclic) directed graph. An acyclic 
directed graph (P,FP) is also defined, with FP C P x P a functional prior- 
ity relation (if (p1,p2) E FP, we also write pı —> p2). This relation should be 
defined at least for processes accessing the same channel, i.e., (p1, p2) E Cp. > 
p2Vp2 — pı. €p Maps every process p to a unique event generator, whereas Ie 
and Oe map each event generator to (possibly empty) partitions of the global set 
of external input channels I and output channels O, resp. de defines the relative 
deadline for accessing the I/O channels of generator e, Xe defines alphabets for 
internal and external I/O channels and CT. specifies the channel types. 


The priority FP defines the order in which two processes are executed when 
invoked at the same time. It is not necessarily a transitive relation. For example, 
if (p1,p2) E FP, (po, p3) € FP, and both pı and p3 get invoked simultaneously 
then FP does not imply any execution-order constraint between them unless 
p2 is also invoked at the same time. The functional priorities differ from the 
scheduling priorities. The former disambiguate the order of read/write accesses 
to internal channels, whereas the latter ensure satisfaction of timing constraints. 


4 Zero-Delay Semantics for the FPPN Model 


The functional determinism requirement prescribes that the data sequences and 
time stamps at the outputs are a well-defined function of the data sequences and 
time stamps at the inputs. This is ensured by the so-called functional priorities. 
In essence, functional priorities control the process job execution order, which 
is equivalent to the effect of fixed priorities on a set of tasks under uniprocessor 
fixed-priority scheduling with zero task execution times. A distinct feature of the 
FPPN model is that priorities are not used directly in scheduling, but rather in 
the definition of model’s semantics. From now on, the term ‘task’ will refer to 
an FPPN process. Following the usual real-time systems terminology, invoking 
a task implies generation of a job which has to be executed before the task’s 
deadline. The so-called precedence constraints, i.e., the semantical restrictions 
of FPPN job execution order are implied firstly from the time stamps when the 
tasks are invoked and secondly from the functional priorities. In this section, we 
define these constraints in terms of a sequential order (an execution trace). 

The FPPN model requires that all simultaneous process invocations should 
be signaled synchronously. This can be realized by introducing a periodic clock 
with sufficiently small period (the gcd of all Tp), such that invocations events 
can only occur at clock ticks, synchronously. Two variant semantics are then 
defined, namely the zero-delay and the real-time semantics. 

The zero-delay semantics imposes an ordering of the job executions assuming 
that they have zero delay and that they are never postponed to the future. Since 
in this case the deadlines are always met even without exploiting parallelism, a 
sequential execution of processes is considered for simplicity. The semantics is 
defined in terms of the rules for constructing the execution trace of the FPPN for 
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a given sequence (t1, Pt), (t2, P?) ..., where tı < t2 < ... are time stamps and 
P’ is the multiset of processes invoked at time t;. For convenience, we associate 
each ‘invoked process’ p in P* with respective invocation event, ep. The execution 
trace has the form: 


Trace(PN) = w(t1) o at ow(tz) 0a”... 


where a’ is a concatenation of job executions of processes in P* included in an 
order, such that if pı — p2 then the job(s) of pı execute earlier than those of po. 


Definition 4 (Configuration). An FPPN configuration (n, y, P) consists of: 


— a process configuration t, a function that assigns to every process a state 
m(p) € D(X,) 

— a channel configuration y, i.e., the states of internal and external channels 

— a set of pending events P 


Executing one job in a process network: 


(m(p),) 5, (X'y) ^ €p € P 
A 
Ap! : ep E€ PA (p',p) € FP 
(x, Y, P) Spy (a{X'/p}, ae P \ {ep}) 
where 7{X’/p} is obtained from 7 by replacing the state of p by X’. 
Given a non-empty set of events P invoked at time t, a maximal execution 


run of a process network is defined by a sequence of job executions that continues 
until the set of pending events is empty. 


(10, 7°, P) Sew (71,71; P \ {ep }) DN eee (x1, 7,0) 
(x9, 79) w(t)oa10a20... 


PN (P) a) 


Given an initial configuration (7°, y°) and a sequence (t1, P+), (t2, P?) ...of 
events invoked at times tı < tg < ..., the run of process network is defined by a 
sequence of maximal runs that occur at the specified time stamps. 


1 az 
Run(PN) = a") PN (P!) (x, 7") PN (P?) ++: 


The execution trace of a process network is a projection of the process network 
run to actions: 
Trace(PN) =a'oa?... 


This trace represents the time stamps (w(t,),w(t2)...) and the data process- 
ing actions executed at every time stamp. From the effect of these actions it is 
possible to determine the sequence of values written to the internal and exter- 
nal channels. These values depend on the states of the processes and internal 
channels. The concurrent activities — the job executions — that modify each pro- 
cess/channel states are deterministic themselves and are ordered relatively to 
each other in a way which is completely determined by the time stamps and the 
FP relation. Therefore we can make the following claim. 
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Proposition 1 (Functional determinism). The sequences of values written 
at all external and internal channels are functionally dependent on the time 
stamps of the event generators and on the data samples at the external inputs. 


Basically, this property means that the outputs calculated by FPPN depend 
only on the event invocation times and the input data sequences, but not on the 
scheduling. To exploit task parallelism, in the real-time semantics of Sect.5 the 
sequential order of execution and the zero-delay assumption are relaxed. 


5 Real-Time Semantics for the FPPN Model 


In the real-time semantics, job executions last for some physical time and can 
start concurrently with each other at any time after their invocation. Certain 
precedence constraints are respected which for certain jobs impose the same rel- 
ative order of execution as in the zero-delay semantics, so that non-deterministic 
updates of the states of processes and channels are excluded. To ensure time- 
liness, the jobs should complete their execution within the deadline after their 
invocation. The semantics specifies the entities for communication, synchroniza- 
tion, scheduling and is defined by compilation to an executable formal specifica- 
tion language. 

Our approach is based on (real-time) ‘BIP’ [3] for modeling networks of 
connected timed automata components [24]. We adopt the extension in [10], 
which introduces the concept of continuous (asynchronous) automata transi- 
tions, which, unlike the default (discrete) transitions take a certain physical 
time. Next to support of tasks (via continuous transitions), BIP supports the 
urgency in timing constraints, and those are timed-automata features required 
for adequate modeling and timing verification of dataflow languages [9]. An 
important BIP language feature for implementing the functional code of tasks 
is the possibility to specify data actions in imperative programming language 
(C/C++). 

Figure 3 illustrates how an FPPN process is compiled to a BIP component. 
The source code is parsed, searching for primitives that are relevant for the inter- 
actions of the process with other components. The relevant primitives are the 
reads and writes from/to the data channels. For those primitives the generated 
BIP component gets ports, e.g., ‘XIF_Read(IN x,IN valid)’, through which the 
respective transitions inside the component synchronize and exchange data with 
other components. In line with Definition 1, every job execution corresponds 
to a sequence of transitions that starts and ends in an initial location. The first 
transition in this sequence, ‘Start’, is synchronized with the event generator com- 
ponent, which enables this transition only after the process has been invoked. 
The event generator shown in Fig.3 is a simplified variant for periodic tasks 
whose deadline is equal to the period. In [22] it is also described how we model 
internal channels and give more details on event generator modelling. 

To ensure a functional behavior equivalent to zero-delay semantics, the job 
executions have to satisfy precedence constraints between subsequent jobs of 
the same process, and the jobs of process pairs connected by a channel. In both 
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EventGenerator ( T) 


Invoke 


void SQR_Init() { O location 


index = 0; oe i 
j kO) Initial location 


Deadline 
when[x=T] 


void SQR_Execute() { — >> discrete transition 
XIF_Read(&x, &x_valid); 


if (x_valid) { ===) continuous transition 
y=x*x 
YIF_Write(&y); when [x= 7] timing condition 


} [ valid] data condition 


index :=0 
x 
init 
y SQR_Start 


index = index + 1; XIF_Read{ IN x, IN valid ) 


reset x timing action YIF_Write( OUT y) 


y:=x"x data action SQR Finish 


YIF_Write(OUTy) | port 


multi-port 
connector 


Fig. 3. Compilation of functional code to BIP 


4 XIF_Read(IN x, IN valid) 


[valid] 
y:=xX*x 
A 


Write(OUT y) 


index := index+1 


cases, the relative execution order of these subsets of jobs is dictated by zero- 
delay semantics, whereby the jobs are executed in the invocation order and the 
simultaneously invoked jobs follow the functional priority order. In this way, we 
ensure deterministic updates in both cases: (i) for the states of processes by 
excluding auto-concurrency, and (ii) for the data shared between the processes 
by excluding data races on the channels. The precedence constraints for (i) are 
satisfied by construction, because BIP components for processes never start a 
new job execution until the previous job of the same process has finished. For the 
precedence constraints in (ii), an appropriate component is generated for each 
pair of communicating processes and plugged incrementally into the network of 
BIP components. 

Figure 4 shows such a component generated a given pair of processes “A” 
and “B”, assuming (A, B) € FP. We saw in Fig.3 that the evolution of a job 
execution goes through three steps: ‘invoke’, ‘start’ and ‘finish’. The component 
handles the three steps of both processes in almost symmetrical way, except in 
the method that determines whether the job is ready to start: if two jobs are 
simultaneously invoked, then first the job of process “A” gets ready and then, 
after it has executed, the job of “B” becomes ready. The “Functional Priority” 
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FunctionalPriority (54,52, q4, qg) Imposing precedence when (A,B) € FP 


54,53 — invocation poll period; t4, tg- anticipated invocation time 


qa, 4g — job queue size; Qa, Qg — queue of struct (time, active) 


initA(); 
initB(); 
FalselnvokeA InvokeA InvokeB FalselnvokeB 


cancelA(); invokeA(); invokeB(); cancelB(); 
advanceA() advanceA(); advanceB(); advanceB() 


7 Stana Finisha _FinishB S . 
Mme busy:=False; busy:=False; y 
and readyAQ) ] and readyB 0 ] 
Q4-Pop(); Qz-Pop(); 


busy:=True; busy:=True; 


invoked |[Falseinvokea |[Starta][Finisha|_ [invokes |{Falseinvokes |[Starta] [Finisha 
inita@() advancea() 
ta:=0; ta:= tat a 
Q,-Allocate ():= struct ( time=>0, active=>false); | Q,.Allocate():= struct ( time=>tą, active=>false)} 
Q,-Push(); Q,-Push(); 
invokea() cancela() 
Q,.Tail.active:= true; Q,-Pull(); 


ready @() 
[ Q,.Head.active ] and 


[ (Q,-Head.time) < (Q æ. Head .time) ] 
or 


[I (Qq-Head.time) = (Q @. Head .time) ] and_a = A] 


Fig. 4. Imposing precedence order between “A”, “B” (“A” has higher functional priority) 


component maintains two job queues” denoted Qa where a € {A, B} indicates 
a process selection. In our notation, @ means ‘other than a’, i.e., if a = A then 
a = B and if a = B then a= A. 

The component receives from the event generator of process ‘œ’ at regular 
intervals with period ĝa either ‘Invoke a’ or ‘FalseInvoke a’. In the latter case 
(i.e., no invocation), the job in the tail of the queue is ‘pulled’ away®. 


? Queues are implemented by a circular buffer with the following operations: 
— Allocate() picks an available (statically allocated) cell and gives reference to it 
— Push() push the last allocated cell into the tail 
— Pull() undo the push 
— Pop() retrieve the data from the head of the queue. 


3 Thanks to ‘init a’ and ‘advance a’, the queue tail always contains the next antici- 
pated job, which is conservatively marked as non-active until ‘Invoke a’ transition. 
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6 Model Transformation Framework 


The model-based design philosophy for embedded systems which we follow [14] 
is grounded on the evolutionary design using models, which support the gradual 
refinement (refined models are more accurate than those refined) and the setting 
of real-time attributes that ensure predictable timing. Such a process allows 
considering various design scenarios and promotes the late binding to design 
decisions. Our approach to refinement is based on incremental component-based 
models, where the system is evolved by incrementally plugging new components 


and transforming existing ones. 
Task graph 
Static 
schedule 


Schedule-to-BIP 
transformation 


FPPN BIP System BIP 
model Plugging model 
the online scheduler 


Fig. 5. Evolutionary design of time-critical systems using FPPNs 
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data channels 
and priorities 


Architectural 
model 


Functional 
code 


TASTE-to-BIP model 
transformation 


We propose such a design approach (Fig. 5), in which we take as a starting 
point a set of tasks defined by their functional code and real-time attributes 
(e.g., periods, deadlines, WCET, job queue capacity). We assume that these 
tasks are encapsulated into software-architecture functional blocks, correspond- 
ing to FPPN processes. Before being integrated into a single architectural model 
they can be compiled and tested separately by functional simulation or by run- 
ning on embedded platform. 

The high-level architecture description framework of our choice is the TASTE 
toolset [14, 19], whose front-end tools are based on the AADL (Architecture Anal- 
ysis & Design Language) syntax [7]. An architecture model in TASTE consists of 
functional blocks — so-called ‘functions’ — which interact with each other via pairs 
of interfaces (IF) ‘required IF’/‘provided IF’, where the first performs a proce- 
dure call in the second one. In TASTE, the provided interfaces can be explicitly 
used for task invocations, i.e., they may get attributes like ‘periodic’ /‘sporadic’, 
‘deadline’ and ‘period’. The FPPN processes are represented by TASTE ‘func- 
tions’ that ‘provide’ such interfaces, implementing job execution of the respective 
task in C/C++. Our TASTE-to-BIP framework is available for download at [2]. 

The first refinement step is plugging the data channels for explicit commu- 
nication between the processes. The data channels are also modeled as TASTE 
functions, whereas reads and writes are implemented via interfaces. We have 
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Fig. 6. Model and graph transformations for the FPPN semantics 


amended the attributes of TASTE functions to reflect the priority index of pro- 
cesses and the parameters of FPPN channels, such as capacity of FIFO channels. 
The resulting model can be compiled and simulated in TASTE. 

The second and final refinement step is scheduling. To schedule on multi- 
cores while respecting the real-time semantics of FPPN this step is preceded by 
transformation from TASTE architectural model into BIP FPPN model. The 
transformation process implements the FPPN-to-BIP ‘compilation’ sketched in 
the previous section, and we believe it could be formalized by a set of trans- 
formation rules. For example, as illustrated in Fig. 6, one of the rules could say 
that if there are two tasks 7, and Tə related by FP relation then their respective 
BIP components Bı and B2 are connected (via ‘Start’ and ‘Finish’ ports) to a 
functional priority component. 

The scheduling is done offline, by first deriving a task graph from the archi- 
tectural model, taking into account the periods, functional priorities and WCET 
of processes. The task graph represents a maximal set of jobs invoked in a hyper- 
period and their precedence constraints; it defines the invocation and the dead- 
line of jobs relatively to the hyperperiod start time. The task graph derivation 
algorithm is detailed in [20]. 


Definition 5 (Task Graph). A directed acyclic graph TG(J,E) whose nodes 
J = {Ji} are jobs defined by tuples J; = (pi, ki, Ai, Di, Wi), where p; is the 
job’s process, ki is the job’s invocation count, Ai E Q>o is the invocation time, 
D; € Q, is the absolute deadline and W; E€ Q, is the WCET. The k-th job of 
process p is denoted by p[k]. The edges E represent the precedence constraints. 
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The task graph is given as input to a static scheduler. The schedule obtained 
from the static scheduler is translated into parameters for the online-scheduler 
(cf. Fig.6), which, on top of the functional priority components, further con- 
straints the job execution order and timing, with the purpose of ensuring dead- 
line satisfaction. The joint application/scheduler BIP model is called System 
Model. This model is eventually compiled and linked with the BIP-RTE, which 
ensures correct BIP semantics of all components online [23]. 


7 Case Study: Guidance, Navigation and Control 
Application 


Our design flow was applied to a Guidance Navigation & Control (GNC) on- 
board spacecraft application that was ported onto ESA’s NGMP, more specifi- 
cally the quad-core LEON4FT processor [1]. In the space industry, multi-cores 
provide a means for integrating more software functions onto a single platform, 
which contributes to reducing size, weight, cost, and power consumption. On- 
board software has to efficiently utilize the processor resources, while retaining 
predictability. 

A GNC application affects the movement of the vehicle by reading the 
sensors and controlling the actuators. We estimated the WCETs of all tasks, 
W,, by measurements. There are four tasks: the Guidance Navigation Task 
(Tp =500 ms, d,=500ms, W, =22ms), the Control Output Task (T, = 50 ms, 
dp = 50 ms, Wp =3ms) that sends the outputs to the appropriate spacecraft unit, 
the Control FM Task (T, =50 ms, dp =50ms, Wp =8 ms) which runs the con- 
trol and flight management algorithms, and the Data Input Dispatcher Task 
(T, =50 ms, dp =50ms, Wp =6 ms), which reads, decodes and dispatches data 
to the right destination whenever new data from the spacecraft’s sensors are 
available. The hyperperiod of the system was therefore 500 ms, and it includes 
one execution of the Guidance Navigation Task and ten executions of each other 
task, which results in 31 jobs. The Guidance Navigation and Control Output 
tasks were invoked with relative time offsets 450ms and 30ms, respectively. 
Fig. 7 shows the GNC FPPN, where the functional priorities impose precedence 
from the numerically smaller FP index (i.e., higher-priority) to the numerically 
larger ones, we defined them based on analysis of the specification documents 
and the original implementation of task interactions by inter-thread signalling. 

The architectural model in TASTE format was automatically transformed 
into a BIP model and the task-graph model of the hyperperiod was derived. The 
task graph was passed to the static scheduler, which calculated the system load 
to be 112% (i.e., at least two cores required, taking into account precedences [20] 
and interference [21]) and generated the static schedule. 

The BIP model was compiled and linked with the BIP RTE and the executa- 
bles were loaded and ran on the LEON4FT board. Figure 8 shows the measured 
Gantt chart of a hyper-period (500 ms) plus 100 ms. We label the process execu- 
tions as ‘P<id>’, where ‘<id>’ is a numeric process identifier. Label ‘P20’ is an 
exception, it indicates the execution of the BIP RTE engine and all discrete-event 
controllers — event generators, functional priority controllers, and the online 
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Fig. 8. Execution of the GNC application on LEON4FT (in microseconds). 


scheduler. Since there are four discrete transitions per one job execution and 
31 jobs per hyperperiod, 31 x 4 = 124 discrete transitions are executed by BIP 
RTE per hyperperiod. The P20 activities were mapped to Core 0, whereas the 
jobs of tasks (P1, P2, P3, P4) were mapped to Core 1 and Core 2. P1 stands 
for the Data Input Dispatcher, P2 for the Control FM, P3 for the Control Out- 
put and P4 for the Guidance Navigation task. Right after 10 consecutive jobs 
of P1, P2, P3 the job on P4 is executed. The job of P4 is delayed due to the 
450 ms invocation offset and the least functional priority. Since P3 and P4 do 
not communicate via the channels, in our framework (P3, P4) ¢ FP and they 
can execute in parallel, which was actually programmed in our static schedule. 
Due to more than 100% system load this was necessary for deadline satisfaction. 
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8 Conclusion 


We presented the formal semantics of the FPPN model, at two levels: zero-delay 
semantics with precedence constraints on the job execution order to ensure func- 
tional determinism, and real-time semantics for scheduling. The semantics was 
implemented by a model transformational framework. Our approach was val- 
idated through a spacecraft on-board application running on a multi-core. In 
future work we consider it important to improve the efficiency of code gener- 
ation, formal proofs of equivalence of the scheduling constraints (like the task 
graph) and the generated BIP model. The offline and online schedulers need to 
be enhanced to a wider spectrum of online policies and a better awareness of 
resource interference. 
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Abstract. In safety-critical cyber-physical systems (CPS), a service fail- 
ure may result in severe financial loss or damage in human life. Smart 
CPSs have complex interaction with their environment which is rarely 
known in advance, and they heavily depend on intelligent data process- 
ing carried out over a heterogeneous computation platform and provide 
autonomous behavior. This complexity makes design time verification 
infeasible in practice, and many CPSs need advanced runtime monitoring 
techniques to ensure safe operation. While graph queries are a powerful 
technique used in many industrial design tools of CPSs, in this paper, we 
propose to use them to specify safety properties for runtime monitors on 
a high-level of abstraction. Distributed runtime monitoring is carried out 
by evaluating graph queries over a distributed runtime model of the sys- 
tem which incorporates domain concepts and platform information. We 
provide a semantic treatment of distributed graph queries using 3-valued 
logic. Our approach is illustrated and an initial evaluation is carried out 
using the MoDeS3 educational demonstrator of CPSs. 


1 Introduction 


A smart and safe cyber-physical system (CPS) [23,30,36] heavily depends on 
intelligent data processing carried out over a heterogeneous computation plat- 
form to provide autonomous behavior with complex interactions with an envi- 
ronment which is rarely known in advance. Such a complexity frequently makes 
design time verification be infeasible in practice, thus CPSs need to rely on 
run-time verification (RV) techniques to ensure safe operation by monitoring. 
Traditionally, RV techniques have evolved from formal methods [24,26], 
which provide a high level of precision, but offer a low-level specification lan- 
guage (with simple atomic predicates to capture information about the system) 
which hinders their use in every day engineering practice. Recent RV approaches 
[17] started to exploit rule-based approaches over a richer information model. 
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In this paper, we aim to address runtime monitoring of distributed systems 
from a different perspective by using runtime models (aka models@ runtime 
[8,38]) which have been promoted for the assurance of self-adaptive systems in 
[10,44]. The idea is that runtime models serve as a rich knowledge base for the 
system by capturing the runtime status of the domain, services and platforms 
as a graph model, which serves as a common basis for executing various analysis 
algorithms. Offering centralized runtime models accessible via the network, the 
Kevoree Modeling Framework [28] has been successfully applied in numerous 
Internet-of-Things applications over the Java platform. However, the use of such 
run-time models for analysis purposes in resource-constrained smart devices or 
critical CPS components is problematic due to the lack of control over the actual 
deployment of the model elements to the execution units of the platform. 

Graph queries have already been applied in various design and analysis tools 
for CPSs thanks to their highly expressive declarative language, and their scal- 
ability to large industrial models [40]. Distributed graph query evaluation tech- 
niques have been proposed in [22,34], but all of these approaches use a cloud- 
based execution environment, and the techniques are not directly applicable for 
a heterogeneous execution platform with low-memory computation units. 

As a novelty in our paper, we specify safety criteria for runtime monitor- 
ing by graph queries formulated over runtime models (with domain concepts, 
platform elements, and allocation as runtime information) where graph query 
results highlight model elements that violate a safety criterion. Graph queries 
are evaluated over a distributed runtime model where each model element is 
managed by a dedicated computing unit of the platform while relevant contex- 
tual information is communicated to neighboring computing units periodically 
via asynchronous messages. We provide a semantic description for the distributed 
runtime model using 3-valued logic to uniformly capture contextual uncertainty 
or message loss. Then we discuss how graph queries can be deployed as a service 
to the computing units (i.e., low-memory embedded devices) of the execution 
platform of the system in a distributed way, and provide precise semantics of 
distributed graph query evaluation over our distributed runtime model. We pro- 
vide an initial performance evaluation of our distributed query technique over 
the MoDeS3 CPS demonstrator [45], which is an open source educational plat- 
form, and also compare its performance to an open graph query benchmark [35]. 


2 Overview of Distributed Runtime Monitoring 


Figure 1 is an overview of distributed runtime monitoring of CPSs deployed over 
heterogeneous computing platform using runtime models and graph queries. 
Our approach reuses a high-level graph query language [41] for specifying 
safety properties of runtime monitors, which language is widely used in various 
design tools of CPS [37]. Graph queries can capture safety properties with rich 
structural dependencies between system entities which is unprecedented in most 
temporal logic formalisms used for runtime monitoring. Similarly, OCL has been 
used in [20] for related purposes. While graph queries can be extended to express 
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Fig. 1. Distributed runtime monitoring by graph queries 


temporal behavior [11], our current work is restricted to (structural) safety prop- 
erties where the violation of a property is expressible by graph queries. 

These queries will be evaluated over a runtime model which reflects the cur- 
rent state of the monitored system, e.g. data received from different sensors, the 
services allocated to computing units, or the health information of computing 
infrastructure. In accordance with the models@ runtime paradigm [8,38], observ- 
able changes of the real system gets updated—either periodically with a certain 
frequency, or in an event-driven way upon certain triggers. 

Runtime monitor programs are deployed to a distributed heterogeneous com- 
putation platform, which may include various types of computing units ranging 
from ultra-low-power microcontroller units, through smart devices to high-end 
cloud-based servers. These computation units primarily process the data pro- 
vided by sensors and they are able to perform edge- or cloud-based computations 
based on the acquired information. The monitoring programs are deployed and 
executed on them exactly as the primary services of the system, thus resource 
restrictions (CPU, memory) need to be respected during allocation. 

Runtime monitors are synthesized by transforming high-level query specifi- 
cations into deployable, platform dependent source code for each computation 
unit used as part of a monitoring service. The synthesis includes a query opti- 
mization step and a code generation step to produce platform-dependent C++ 
source code ready to be compiled into an executable for the platform. Due to 
space restrictions, this component of our framework is not detailed in this paper. 

Our system-level monitoring framework is hierarchical and distributed. Mon- 
itors may observe the local runtime model of the their own computing unit, and 
they can collect information from runtime models of different devices, hence pro- 
viding a distributed monitoring architecture. Moreover, one monitor may rely on 
information computed by other monitors, thus yielding a hierarchical network. 
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Running Example. We illustrate our runtime monitoring technique in the 
context of a CPS demonstrator [45], which is an educational platform of a model 
railway system that prevents trains from collision and derailment using safety 
monitors. The railway track is equipped with several sensors (cameras, shunt 
detectors) capable of sensing trains on a particular segment of a track connected 
to some computing units, such as Arduinos, Raspberry Pis, BeagleBone Blacks 
(BBB), or a cloud platform. Computing units also serve as actuators to stop 
trains on selected segments to guarantee safe operation. For space considerations, 
we will only present a small self-contained fragment of the demonstrator. 

In Fig. 1, the System Under Monitor is a snapshot of the system where train 
trl is on segment s4, while tr2 is on s2. The railroad network has a static layout, 
but turnouts tul and tu2 can change between straight and divergent states. Three 
BBB computing units are responsible for monitoring and controlling disjoint 
parts of the system. A computing unit may read its local sensors, (e.g. the 
occupancy of a segment, or the status of a turnout), collect information from 
other units during monitoring, and it can operate actuators accordingly (e.g. 
change turnout state) for the designated segment. All this information is reflected 
in the (distributed) runtime model which is deployed on the three computing 
units and available for the runtime monitors. 


3 Towards Distributed Runtime Models 


3.1 Runtime Models 


Many industrial modeling tools used for engineering CPS [3,31,47] build on the 
concepts of domain-specific (modeling) languages (DSLs) where a domain is typ- 
ically defined by a metamodel and a set of well-formedness constraints. A meta- 
model captures the main concepts in a domain as classes with attributes, their 
relations as references, and specifies the basic structure of graph models. 

A metamodel can be formalized as a vocabulary X = {C,...,Cn,,A1,---; 
Ans,Ri,---,Rng } witha unary predicate symbol C; for each class, a binary predicate 
symbol A; for each attribute, and a binary predicate symbol Ry for each relation. 


Example 1. Figure 2 shows a metamodel for the CPS demonstrator with Comput- 
ing Units (identified on the network by hostID attribute) which host Domain Ele- 
ments and communicate with other Computing Units. A Domain Element is either a 
Train or Railroad Element where the latter is either a Turnout or a Segment. A Trainis 
situated on a Railroad Element which is connected to at most two other Railroad Ele- 
ments. Furthermore, a Turnout refers to Railroad Elements connecting to its straight 
and divergent exits. A Train also knows its speed. 


Objects, their attributes, and links between them constitute a runtime model 
[8,38] of the underlying system in operation. Changes to the system and its 
environment are reflected in the runtime model (in an event-driven or time- 
triggered way) and operations executed on the runtime model (e.g. setting values 
of controllable attributes or relations between objects) are reflected in the system 


Distributed Graph Queries for Runtime Monitoring 115 


[0..*] domainElements 
H Model Root 3$ Domain Element 
[0..*] hosts 
A 


[0..2] connectedTo 


[1..1] divergent | 9 Railroad Element B Train 
[L-1] on 
= speed : Elnt 


(1..1] straight | 


Turnout | E Segment 
[0..*] communicatesWith J 


Fig. 2. Metamodel for CPS demonstrator 


[0..*] |computingUnits 


itself (e.g. by executing scripts or calling services). We assume that this runtime 
model is self-descriptive in the sense that it contains information about the 
computation platform and the allocation of services to platform elements, which 
is a key enabler for self-adaptive systems [10, 44]. 

A runtime model M = (Domy,Tm) can be formalized as a 2-valued logic 
structure over X where Domm = Obj yy U Datayy where Obj jy is a finite set of 
objects, while Data m is the set of (built-in) data values (integers, strings, etc.). 
Tm is a 2-valued interpretation of predicate symbols in X defined as follows: 


— Class predicates: If object op is an instance of class C; then the 2-valued 
interpretation of C; in M denoted by [ci(op)}” = 1, otherwise 0. 

— Attribute predicates: If there exists an attribute of type A; in op with value 
ar in M then [A;(p, a,)]"“ = 1, and otherwise 0. 

— Reference predicates: If there is a link of type Ry from op to og in M then 
[Ri (Op, 0q)]” = 1, otherwise 0. 


3.2 Distributed Runtime Models 


Our framework addresses decentralized systems where each computing unit peri- 
odically communicates a part of its internal state to its neighbors in an update 
phase. We abstract from the technical details of communication, but we assume 
approximate synchrony [13] between the clocks of computing units, thus all 
update messages regarded lost that does not arrive within given timeframe 
Tupdate: 

As such, a centralized runtime model is not a realistic assumption for mixed 
synchronous systems. First, each computing unit has only incomplete knowledge 
about the system: it fully observes and controls a fragment of the runtime model 
(to enforce the single source of truth principle), while it is unaware of the internal 
state of objects hosted by other computing units. Moreover, uncertainty may 
arise in the runtime model due to sensing or communication issues. 


Semantics of Distributed Runtime Models. We extend the concept of runtime 
models to a distributed setting with heterogeneous computing units which peri- 
odically communicate certain model elements with each other via messages. 
We introduce a semantic representation for distributed runtime models (DRMs) 
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which can abstract from the actual communication semantics (e.g. asynchronous 
messages vs. broadcast messages) by (1) evaluating predicates locally at a com- 
puting unit with (2) a 3-valued truth evaluation having a third 1/2 value in 
case of uncertainty. Each computing unit maintains a set of facts described by 
atomic predicates in its local knowledge base wrt. the objects with attributes it 
hosts, and references between local objects. Additionally, each computing unit 
incorporates predicates describing outgoing references for each object it hosts. 
The 3-valued truth evaluation of a predicate P(v1,..., Un) on a computing 
unit cu is denoted by [P(v1,...,Un)]|}@cu. The DRM of the system is constituted 
from the truth evaluation of all predicates on all computing units. For the current 
paper, we assume the single source of truth principle, i.e. each model element is 
always faithfully observed and controlled by its host computing unit, thus the 
local truth evaluation of the corresponding predicate P is always 1 or 0. However, 
3-valued evaluation could be extended to handle such local uncertainties. 
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Fig. 3. Distributed runtime model for CPS demonstrator 


Example 2. Figure3 shows a DRM snapshot for the CPS demonstrator (bot- 
tom part of Fig. 1). Computing units BBB1-BBB3 manage different parts of the 
system, e.g. BBB1 hosts objects s1, s2, tul and tr2 and the links between them. 
We illustrate the local knowledge bases of computing units. 

Since computing unit BBB1 hosts train tr2, thus [Train(tr2)]@BBB1 = 1. 
However, according to computing module BBB2, [Train(tr2)]@BBB2 = 1/2 as 
there is no train tr2 hosted on BBB2, but it may exist on a different one. 

Similarly, [ConnectedTo(sl,s7)]@BBB1 = 1, as BBB1 is the host of s1, the 
source of the reference. This means BBB1 knows that there is a (directed) reference 
of type connected To from s1 to s7. However, the knowledge base on BBB3 may have 
uncertain information about this link, thus [ConnectedTo(s1,s7)]@BBB3 = 1/2, 
i.e. there may be a corresponding link from s1 to s7, but it cannot be deduced using 
exclusively the predicates evaluated at BBB3. 


4 Distributed Runtime Monitoring 


4.1 Graph Queries for Specifying Safety Monitors 


To capture the safety properties to be monitored, we rely on the VIATRA Query 
Language (VQL) [7]. VIATRA has been intensively used in various design tools 
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of CPSs to provide scalable queries over large system models. The current paper 
aims to reuse this declarative graph query language for runtime verification pur- 
poses, which is a novel idea. The main benefit is that safety properties can be 
captured on a high level of abstraction over the runtime model, which eases the 
definition and comprehension of safety monitors for engineers. Moreover, this 
specification is free from any platform-specific or deployment details. 

The expressiveness of the VQL language converges to first-order logic with 
transitive closure, thus it provides a rich language for capturing a variety of com- 
plex structural conditions and dependencies. Technically, a graph query captures 
the erroneous case, when evaluating the query over a runtime model. Thus any 
match (result) of a query highlights a violation of the safety property at runtime. 


Example 3. In the railway domain, safety standards prescribe a minimum dis- 
tance between trains on track [1,14]. Query closeTrains captures a (simplified) 
description of the minimum headway distance to identify violating situations 
where trains have only limited space between each other. Technically, one needs 
to detect if there are two different trains on two different railroad elements, which 
are connected by a third railroad element. Any match of this pattern highlights 
track elements where passing trains need to be stopped immediately. Figure 4a 
shows the graph query closeTrains in a textual syntax, Fig. 4b displays it as a 
graph formula, and Fig. 4c is a graphical illustration as a graph pattern. 


Syntax. Formally, a graph pattern (or query) is a first order logic (FOL) for- 
mula y(v1,...,Un) over variables [42]. A graph pattern y can be inductively 
constructed (see Table1) by using atomic predicates of runtime models C(v), 
A(v1, v2), R(v1, v2), C,A,R € X, equality between variables vı = v2, FOL connec- 
tives V, A, quantifiers 4, V, and positive (call) or negative (neg) pattern calls. 


pattern closeTrains( CloseTrains(St, End) = 


St : RailroadElement , RailroadElement(St)A 
End : RailroadElement) RailroadElement(End)A 
{ 
Train.on(T,St); Ie Tran TIA TSEN 
Train.on(OT,End); JOT : Train(OT) A On(OT, End)A 
T != OT; = (SOUR 
RailroadElement.connectedTo(St, Mid); Mid : RailroadElement(Mid) A 
ConnectedTo(St, Mid)A 
RailroadElement.connectedTo(Mid, End); ConnectedTo( Mid, End)A 
St != End; a(St = End) 
} 
(a) Graph query in the VIATRA Query Language (b) Query as formula 
closeTrains(St,End)| St != End T!=0T 
7 4 St: Mid : End: : 4 
~ Ž way > RailroadElement| ”RailroadElement| ”RailroadElement ¢ gi x Train/” 
on connectedTo connectedTo on 


(c) Graphical query representation 


Fig. 4. Safety monitoring objective closeTrains specified as graph pattern 
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Table 1. Semantics of graph patterns (predicates) 


C(v) 7 Zu (C)(Z(v)) 6. [vr = v2] }'1 iff Z(v1) = Z(v2) 
A(v1, v2)]z Zm (A)(Z (v1), Z(v2)) ENZI min([o]z [ve]7) 
R(v1, v2) 17 Zm (R)(Z(v1), Z(v2)) 8. [y:i V ylz max([eil 7’, [vol ) 
W: oF max{[y] 7.022 € Obj} 9. Pelz 1- ilz 
[vv : elz min{ [eleven : : x € Obj m} 

m JAZ’: ZOZ'A Vier.n: 
10. [call(p(v1, piss Un) 7 Z' (v$) = Z(vi) [p(08, Sa ve) 
11. [neg(y(v1,.--,Un)) val — [call(y(v1,..-,Un)) Z 


oc ie S S 


This language enables to specify a hierarchy of runtime monitors as a query 
may explicitly use results of other queries (along pattern calls). Furthermore, 
distributed evaluation will exploit a spatial hierarchy between computing units. 


Semantics. A graph pattern y(v1,...,Un) can be evaluated over a (central- 
ized) runtime model M (denoted by [y(v1,... Wn) along a variable binding 
Z : {v1,..., Un} —> Domm from variables to objects and data values in M in 


accordance with the semantic rules defined in Table 1 [42]. 

A variable binding Z is called a match if pattern ọ is evaluated to 1 over M, 
ie. [p(v,.. endl = = 1. Below, we may use [y(v1,...,Un)] as a shorthand for 
[y(v1,--- on) when M and Z are clear from context. Note that min and max 
take the numeric minimum and maximum values of 0, 1/2 and 1 with 0 < 1/2 <1. 


4.2 Execution of Distributed Runtime Monitors 


To evaluate graph queries of runtime monitors in a distributed setting, we pro- 
pose to deploy queries to the same target platform in a way that is compliant 
with the distributed runtime model and the potential resource restrictions of 
computation units. If a graph query engine is deployed as a service on a com- 
puting unit, it can serve as a local monitor over the runtime model. However, 
such local monitors are usable only when all graph nodes traversed and retrieved 
during query evaluation are deployed on the same computing unit, which is not 
the general case. Therefore, a distributed monitor needs to gather information 
from other model fragments and monitors stored at different computing units. 


A Query Cycle. Monitoring queries are evaluated over a distributed runtime 
model during the query cycle, where individual computing units communicate 
with each other asynchronously in accordance with the actor model [18]. 


— A monitoring service can be initiated (or scheduled) at a designated comput- 
ing unit cu by requesting the evaluation of a graph query with at least one 
unbound variable denoted as [y(v1,...,Un)] @cu = 

— A computing unit attempts to evaluate a query over its local runtime model. 


Distributed Graph Queries for Runtime Monitoring 119 


— If any links of its local runtime model point to a fragment stored at a neigh- 
boring computing unit, or if a subpattern call is initiated, corresponding query 
R(v1, v2), call(y) or neg(y) needs to be evaluated at all neighbors cu;. 

— Such calls to distributed monitors are carried out by sending asynchronous 
messages to each other thus graph queries are evaluated in a distributed way 
along the computing platform. First, the requester cu, sends a message of 
the form “[y(v1,...,Un)]@cup =?”. The provider cu, needs to send back a 
reply which contains further information about the internal state or previous 
monitoring results of the provider which contains all potential matches known 
by cup, ie. all bindings [y(01,...,0n)]|]@cup, > 1/2 (where we abbreviated the 
binding v; + o; into the predicate as a notational shortcut). 

— Matches of predicates sent as a reply to a computing unit can be cached. 

— Messages may get delayed due to network traffic and they are considered 
to be lost by the requester if no reply arrives within a deadline. Such a case 
introduces uncertainty in the truth evaluation of predicates, i.e. the requestor 
cu, stores [y]@cu, = 1/2 in its cache, if the reply of the provider cup is lost. 

— After acquiring truth values of predicates from its neighbors, a computing 
unit needs to decide on a single truth value for each predicate evaluated 
along different variable bindings. This local decision will be detailed below. 

— At the end of the query cycle, each computing unit resets its cache to remove 
information acquired within the last cycle. 


Example 4. Figure5 shows the beginning of a query evaluation sequence for 
monitor closeTrains initiated at computing unit BBB3. Calls are asynchronous 
(cf. actor model), while diagonal lines illustrate the latency of network commu- 
nication. Message numbers represent the order between timestamps of messages. 

When the query is initiated (message 1, shortly, m1), and the first predicate 
Train of the query is sent to the other two computing unit as requests with a 
free variable parameter T (m2 and m3). In the reply messages, BBB2 reports 
trl as an object satisfying the predicate (m4), while BBB1 answers that tr2 is 
a suitable binding to T (m5). Next BBB3 is requesting facts about outgoing 


Computing Unit 1 Computing Unit 3 Computing Unit 2 
BBB1 BBB3 BBB2 


A update 


@tlicloseTrains (St, End)@BBB3=2) 


3[[Train(T)]1@BBB1=? 2 [Train @BBB2=? J 


S[{Train(tr2)]1@BBB1=1 4 Urraine esse 


le 


6[[On(tr2,St)]]@BBB1=? 7[[On(tr1,St) ]]@BBB2=7 
9 [[On(tr2,s2)]]@BBB1=1 8[[On(trl,s4)]]@BBB2—=1 _ 7] 
< 


> 
ee pellConnectedTo(s2,Mid)@BBB1=? 


11[[ConnectedTo(s2,s3)]]@BBB2=1? 
>| 


12[[ConnectedTo(s2,s3)]]@BBB2=1 = 


k 
13[[ConnectedTo(s2,tu1)]]@BBB1=1, 
—[[ConnectedTo(s2,s3)]]@BBB1=1 


wv aS ee ~ 


Fig. 5. Beginning of distributed query execution for monitor closeTrains 
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references of type On leading from objects tr2 and tr1 to objects stored in BBB1 
and BBB2, respectively (m6 and m7). As the answer, each computing unit sends 
back facts stating outgoing references from the objects (m8 and m9). 

The next message (m10) asks for outgoing references of type ConnectedTo 
from object s2. To send a reply, first BBB1 asks BBB2 to ensure that a reference 
from s2 to s3 exists, since s3 is hosted by BBB2 (m11). This check adds tolerance 
against lost messages during model update. After BBB1 receives the answer from 
BBB2 (m12), it replies to BBB3 containing all facts maintained on this node. 


Semantics of Distributed Query Evaluation. Each query is initiated at a des- 
ignated computing unit which will be responsible for calculating query results 
by aggregating the partial results retrieved from its neighbors. This aggregation 
has two different dimensions: (1) adding new matches to the result set calculated 
by the provider, and (2) making a potential match more precise. While the first 
case is a consequence of the distributed runtime model and query evaluation, the 
second case is caused by uncertain information caused by message loss /delay. 
Fortunately, the 3-valued semantics of graph queries (see Table 1) already 
handles the first case: any match reported to the requester by any neighboring 
provider will be included in the query results if its truth evaluation is 1 or 1/2. 
As such, any potential violation of a safety property will be detected, which may 
result in false positive alerts but critical situations would not be missed. 
However, the second case necessitates extra care since query matches coming 
from different sources (e.g. local cache, reply messages from providers) need to 
be fused in a consistent way. This match fusion is carried out at cu as follows: 


— If a match is obtained exclusively from the local runtime model of cu, then it 
is a certain match, formally [y(01,...,0n)]@cu = 1. 

— If a match is sent as a reply by multiple neighboring computing units cu, 
(with cu; € nbr(cu)), then we take the most certain result at cu, formally, 
[y(01,---,0n)]@cu := max{[y(o1,..., On) |} @cuj|cu; E€ nbr(cu)}. 

— Otherwise, tuple 01,...,0, is surely not a match: [p(01,...,0n)]@cu = 0. 


Note that in the second case uses max{} to assign a maximum of 3-valued 
logic values wrt. information ordering (which is different from the numerical 
maximum used in Table 1). Information ordering is a partial order ({1/2,0, 1}, E) 
with 1/2 E 0 and 1/2 E 1. It is worth pointing out that this distributed truth 


evaluation is also in line with Sobociński 3-valued logic axioms [33]. 


Performance Optimizations. Each match sent as a reply to a computing unit 
during distributed query evaluation can be cached locally to speed up the re- 
evaluation of the same query within the query cycle. This caching of query results 
is analogous to memoing in logic programming [46]. Currently, cache invalidation 
is triggered at the end of each query cycle by the local physical clock, which we 
assume to be (quasi-)synchronous with high precision across the platform. 

This memoing approach also enables units to selectively store messages in the 
local cache depending on their specific needs. Furthermore, this can incorporate 
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to deploy query services to computing units with limited amount of memory and 
prevent memory overflow due to the several messages sent over the network. 

A graph query is evaluated according to a search plan [43], which is a list of 
predicates ordered in a way that matches of predicates can be found efficiently. 
During query evaluation, free variables of the predicates are bound to a value 
following the search plan. The evaluation terminates when all matches in the 
model are found. An in-depth discussion of query optimization is out of scope 
for this paper, but Sect. 5 will provide an initial investigation. 


Semantic Guarantees and Limitations. Our construction ensures that (1) the 
execution will surely terminate upon reaching the end of the query time win- 
dow, potentially yielding uncertain matches, (2) each local model serves as a 
single source of truth which cannot be overridden by calls to other computing 
units, and (3) matches obtained from multiple computing units will be fused by 
preserving information ordering. The over- and under approximation properties 
of 3-valued logic show that the truth values fused this way will provide a sound 
result (Theorem 1 in [42]). Despite the lack of total consistency, our approach 
still has safety guarantees by detecting all potentially unsafe situations. 

There are also several assumptions and limitations of our approach. We use 
asynchronous communication without broadcast messages. We only assumed 
faults of communication links, but not the failures of computing units. We also 
excluded the case when computing units maliciously send false information. 
Instead of refreshing local caches in each cycle, the runtime model could incorpo- 
rate information aging which may enable to handle other sources of uncertainty 
(which is currently limited to consequences of message loss). Finally, in case of 
longer cycles, the runtime model may no longer provide up-to-date information 
at query evaluation time. 


Implementation Details. The concepts presented in the paper are implemented 
in a prototype software, which has three main components: (i) an EMF-based 
tool [39] for data modeling and code generation for the runtime model, (ii) an 
Eclipse-based tool for defining and compiling monitoring rules built on top of the 
VIATRA framework [41], and (iii) the runtime environment to evaluate queries. 
The design tools are dominantly implemented in Java. We used EMF meta- 
models for data modeling, but created a code generator to derive lightweight 
C++ classes as representations of the runtime model. The query definition envi- 
ronment was extended to automatically compile queries into C++ monitors. 
The runtime monitoring libraries and the runtime framework is available in 
C++. Our choice of C++ is motivated by its low runtime and memory overhead 
on almost any type of platforms, ranging from low-energy embedded microcon- 
trollers to large-scale cloud environments. Technically, a generic query service 
can start query runners for each monitoring objective on each node. While query 
runners execute the query-specific search plan generated compile time, the net- 
work communication is handled by a query service if needed. To serialize the 
data between different nodes, we used the lightweight Protocol Buffers [16]. 
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5 Evaluation 


We conducted measurements to evaluate and address two research questions: 


Q1: How does distributed graph query execution perform compared to executing 
the queries on a single computing unit? 

Q2: Is query evaluation performance affected by alternative allocation of model 
objects to host computing units? 


5.1 Measurement Setup 


Computation Platform. We used the real distributed (physical) platform of the 
CPS demonstrator to answer these research questions (instead of setting up a 
virtual environment). It consists of 6 interconnected BBB devices (all running 
embedded Debian Jessie with PREEMPT-RT patch) connected to the railway 
track itself. This arrangement represents a distributed CPS with several com- 
puting units having only limited computation and communication resources. We 
used these units to maintain the distributed runtime model, and evaluate mon- 
itoring queries. This way we are able to provide a realistic evaluation, however, 
due to the fixed number of embedded devices built into the platform, we cannot 
evaluate the scalability of the approach wrt. the number of computing units. 


CPS Monitoring Benchmark. To assess the distributed runtime verification 
framework, we used the MoDeS3 railway CPS demonstrator where multiple 
safety properties are monitored. They are all based on important aspects of the 
domain, and they have been integrated into the real monitoring components. 
Our properties of interest (in increasing complexity of queries) are the following: 


— Train locations: gets all trains and the segments on which trains are located. 

— Close trains: this pattern is the one introduced in Fig. 4. 

— Derailment: detects the train when approaching a turnout, but the turnout 
is set to the other direction (causing the train to run off from the track). 

— End of siding: detects trains approaching an end of the track. 


Since the original runtime model of the CPS demonstrator has only a total of 
49 objects, we scaled up the model by replicating the original elements (except 
for the computing units). This way we obtained models with 49-43006 objects 
and 114-109015 links, having similar structural properties as the original one. 


Query Evaluation Benchmark. In order to provide an independent evaluation 
for our model query-based monitoring approach, we adapted the open-source 
Train Benchmark [35] that aims at comparing query evaluation performance of 
various tools. This benchmark defines several queries describing violations of 
well-formedness constraints with different complexity over graph models. More- 
over, it also provides a model generator to support scalability assessment. 
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5.2 Measurement Results 


Execution Times. The query execution times over models deployed to a single 
BBB were first measured to obtain a baseline evaluation time of monitoring for 
each rule (referred to as local evaluation). Then the execution times of system- 
level distributed queries were measured over the platform with 6 BBBs, evalu- 
ating two different allocations of objects (standard and alternative evaluations). 

In Fig. 6 each result captures the times of 29 consecutive evaluations of queries 
excluding the warm-up effect of an initial run which loads the model and cre- 
ates necessary auxiliary objects. A query execution starts when a node initiates 
evaluation, and terminates when all nodes have finished collecting matches and 
sent back their results to the initiator. 


Overhead of Distributed Evaluation. On the positive side, the performance of graph 
query evaluation on a single unit is comparable to other graph query techniques 
reported in [35] for models with over 100 K objects, which shows a certain level 
of maturity of our prototype. Furthermore, the CPS demonstrator showed that 
distributed query evaluation yielded significantly better result than local-only exe- 
cution for the Derailment query on medium size models (with 4K—43K objects 
reaching 2.23 x — 2.45 x average speed-up) and comparable runtime for Close trains 
and Train locations queries on these models (with the greatest average difference 
being 30 ms across all model sizes). However, distributed query evaluation had 
problems for End of siding, which is a complex query with negative application con- 
ditions, which provides clear directions for future research. Anyhow, the parallelism 
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of even asmall execution platform with only 6 computing units could suppress the 
communication overhead between units in case of several distributed queries, which 
is certainly a promising outcome. 


Impact of Allocation on Query Evaluation. We synthesized different allocations 
of model elements to computing units to investigate the impact of allocation 
of model objects on query evaluation. With the CPS demonstrator model in 
particular, we chose to allocate all Trains to BBB1, and assigned every other 
node stored previously on BBB1 to the rest of the computing units. Similarly, 
for the Train Benchmark models, we followed this pattern with selected types, 
in addition to experimenting with fully random allocation of objects. 

The two right-most columns of Fig. 6a and 6b show results of two alternate 
allocations for the same search plan with a peak difference of 2.06 (Derailment) 
and 19.92x (Semaphore neighbor) in the two cases. However, both of these allo- 
cations were manually optimized to exploit locality of model elements. In case 
of random allocations, difference in runtime may reach an order of magnitude!. 
Therefore it is worth investigating new allocation strategies and search plans for 
distributed queries for future work. 


Threats to Validity. The generalizability of our experimental results is limited by 
certain factors. First, to measure the performance of our approach, the platform 
devices (1) executed only query services and (2) connected to an isolated local 
area network via Ethernet. Performance on a real network with a busy chan- 
nel would likely have longer delays and message losses thus increasing execution 
time. Then we assessed performance using a single query plan synthesized auto- 
matically by the VIATRA framework but using heuristics to be deployed for a 
single computation unit. We believe that execution times of distributed queries 
would likely decrease with a carefully constructed search plan and allocation. 


6 Related Work 


Runtime Verification Approaches. For continuously evolving and dynamic CPSs, 
an upfront design-time formal analysis needs to incorporate and check the robust- 
ness of component behavior in a wide range of contexts and families of config- 
urations, which is a very complex challenge. Thus consistent system behavior 
is frequently ensured by runtime verification (RV) [24], which checks (poten- 
tially incomplete) execution traces against formal specifications by synthesizing 
verified runtime monitors from provenly correct design models [21, 26]. 

Recent advances in RV (such as MOP [25] or LogFire [17]) promote to capture 
specifications by rich logic over quantified and parameterized events (e.g. quanti- 
fied event automata [4] and their extensions [12]). Moreover, Havelund proposed 
to check such specifications on-the-fly by exploiting rule-based systems based on 
the RETE algorithm [17]. However, this technique only incorporates low-level 
events; while changes of an underlying data model are not considered as events. 


1 See Appendix A for details under http://bit.ly/2op3tdy. 
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Traditional RV approaches use variants of temporal logics to capture the 
requirements [6]. Recently, novel combinations of temporal logics with context- 
aware behaviour description [15,19] (developed within the R3-COP and R5-COP 
FP7 projects) for the runtime verification of autonomous CPS appeared and 
provide a rich language to define correctness properties of evolving systems. 


Runtime Verification of Distributed Systems. While there are several existing 
techniques for runtime verification of sequential programs available, the authors 
of [29] claim that much less research was done in this area for distributed systems. 
Furthermore, they provide the first sound and complete algorithm for runtime 
monitoring of distributed systems based on the 3-valued semantics of LTL. 
The recently introduced Brace framework [49] supports RV in distributed 
resource-constrained environments by incorporating dedicated units in the sys- 
tem to support global evaluation of monitoring goals. There is also focus on 
evaluating LTL formulae in a fully distributed manner in [5] for components com- 
municating on a synchronous bus in a real-time system. Additionally, machine 
learning-based solution for scalable fault detection and diagnosis system is pre- 
sented in [2] that builds on correlation between observable system properties. 


Distributed Graph Queries. Highly efficient techniques for local-search based [9] 
and incremental model queries [40] as part of the VIATRA framework were devel- 
oped, which mainly builds on RETE networks as baseline technology. In [34], a 
distributed incremental graph query layer deployed over a cloud infrastructure 
with numerous optimizations was developed. Distributed graph query evaluation 
techniques were reported in [22,27,32], but none of these techniques considered 
an execution environment with resource-constrained computation units. 


Runtime Models. The models@ runtime paradigm [8] serves as the concep- 
tual basis for the Kevoree framework [28] (developed within the HEADS FP7 
project). Other recent distributed, data-driven solutions include the Global Data 
Plane [48] and executable metamodels at runtime [44]. However, these frame- 
works currently offer very limited support for efficiently evaluating queries over 
a distributed runtime platform, which is the main focus of our current work. 


7 Conclusions 


In this paper, we proposed a runtime verification technique for smart and safe 
CPSs by using a high-level graph query language to capture safety properties for 
runtime monitoring and runtime models as a rich knowledge representation to 
capture the current state of the running system. A distributed query evaluation 
technique was introduced where none of the computing units has a global view 
of the complete system. The approach was implemented and evaluated on the 
physical system of MoDeS3 CPS demonstrator. Our first results show that it 
scales for medium-size runtime models, and the actual deployment of the query 
components to the underlying platform has significant impact on execution time. 
In the future, we plan to investigate how to characterize effective search plans 
and allocations in the context of distributed queries used for runtime monitoring. 
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Abstract. JavaScript web applications (apps) are prevalent these days, 
and quality assurance of web apps gets even more important. Even 
though researchers have studied various analysis techniques and software 
industries have developed code analyzers for their own code repositories, 
statically analyzing web apps in a sound and scalable manner is chal- 
lenging. On top of dynamic features of JavaScript, abundant execution 
flows triggered by user events make a sound static analysis difficult. 

In this paper, we propose a novel EventHandler (EHH)-based static 
analysis for web apps using dynamically collected state information. 
Unlike traditional whole-program analyses, the ÆH-based analysis inten- 
tionally analyzes partial execution flows using concrete user events. Such 
analyses surely miss execution flows in the entire program, but they ana- 
lyze less infeasible flows reporting less false positives. Moreover, they can 
finish analyzing partial flows of web apps that whole-program analyses 
often fail to finish analyzing, and produce partial bug reports. Our exper- 
imental results show that the #H-based analysis improves the precision 
dramatically compared with a state-of-the-art JavaScript whole-program 
analyzer, and it can finish analysis of partial execution flows in web apps 
that the whole-program analyzer fails to analyze within a timeout. 


Keywords: JavaScript - Web applications - Event analysis 
Static analysis 


1 Introduction 


Web applications (apps) written in HTML, CSS, and JavaScript have become 
prevalent, and JavaScript is now the 7th most popular programming lan- 
guage [22]. Because web apps can run on any platforms and devices that provide 
any browsers, they are being used widely. The overall structure of web apps 
is specified in HTML, which is represented as a tree structure via Document 
Object Model (DOM) APIs. CSS describes visual effects like colors, positions, 
and animation of contents of the web app, and JavaScript handles events trig- 
gered by user interaction. JavaScript code can change the status of the web app 
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by interoperation with HTML and CSS, load other JavaScript code dynamically, 
and access device-specific features via APIs provided by underlying platforms. 
JavaScript is the de facto standard language for web programming these days. 

To help developers build high-quality web apps, researchers have studied var- 
ious analysis techniques and software industries have developed in-house static 
analyzers. Static analyzers such as SAFE [12,15], TAJS [2,10], and WALA [19] 
analyze JavaScript web apps without concretely executing them, and dynamic 
analyzers such as Jalangi [20] utilize concrete values obtained by actually exe- 
cuting the apps. Thus, static analysis results aim to cover all the possible execu- 
tion flows but they often contain infeasible execution flows, and dynamic anal- 
ysis results contain only real execution flows but they often struggle to cover 
abundant execution flows. Such different analysis results are meaningful for dif- 
ferent purposes: sound static analysis results are critical for verifying absence 
of bugs and complete dynamic analysis results are useful for detecting gen- 
uine bugs. In order to enhance the quality of their own software, IT companies 
develop in-house static analyzers like Infer from Facebook [4] and Tricorder from 
Google [18]. 

However, statically analyzing web apps in a sound and scalable manner is 
extremely challenging. Especially because JavaScript, the language that handles 
controls of web apps, is totally dynamic, purely static analysis has various limita- 
tions. While JavaScript can generate code to execute from string literals during 
evaluation, such code is not available for static analyzers before run time. In 
addition, dynamically adding and deleting object properties, and treating prop- 
erty names as values make statically analyzing them difficult [17]. Moreover, 
since execution flows triggered by user events are abundant, statically analyzing 
them often incurs analysis performance degradation [16]. 

Among many challenges in statically analyzing JavaScript web apps, we 
focus on analysis of event-driven execution flows in this paper. Most existing 
JavaScript static analyzers are focusing on analysis of web apps at loading time 
and they over-approximate event-driven execution flows to be sound. In order 
to consider all possible event sequences soundly, they abstract the event-driven 
semantics in a way that any events can happen in any order. Such a sound 
event modeling contains many infeasible event sequences, which lead to unnec- 
essary operations computing imprecise analysis results. Thus, the state-of-the-art 
JavaScript static analyzers often fail to analyze event flows in web apps. 

In this paper, we propose a novel EventHandler-based (EH-based) static anal- 
ysis for web apps using dynamically collected state information. First, we present 
a new analysis unit, an EH. While traditional static analyzers perform whole- 
program analysis covering all possible execution flows, the ÆH-based analysis 
aims to analyze partial execution flows triggered by user events more precisely. 
In other words, unlike the whole-program analysis that starts analyzing from 
a single entry point of a given program, the FH-based analysis considers each 
event function call triggered by a user event as an entry point. Because the 
EH-based analysis enables a subset of the entire execution flows to be analyzed 
at a time, it can analyze less infeasible execution flows than the whole-program 
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analysis, which balances soundness and precision. Moreover, since it considers a 
smaller set of execution flows, it may finish analysis of web apps that the whole- 
program analysis fails to analyze within a reasonable timeout. Second, in order 
to analyze each event function call in arbitrary call contexts, we present a hybrid 
approach to construct an abstract heap for the event function call. More specifi- 
cally, to analyze each event function body, the analyzer should have information 
about non-local variables. Thus, for each event function, we construct a conser- 
vative abstract initial heap that holds abstract values of non-local variables by 
abstraction of dynamically collected states. 

We formally present the mechanism as a framework, EHA, parameterized by 
a dynamic event generator and a static whole-program analyzer. After describ- 
ing the high-level structure of EHA, we present its prototype implementa- 
tion, EHAghre, instantiated with manual event generation and a state-of-the-art 
JavaScript static analyzer SAFE. Our experimental results show that EHAghre 
indeed reports less false positives than SAFE, and it can finish analysis of parts 
of web apps that SAFE fails to analyze within the timeout of 72 h. 

Our paper makes the following contributions: 


— We propose EHA, a bug detection framework that performs static analysis 
for each event handler as an entry point using an abstraction of dynamically 
collected states as an initial heap. 

— We present EHAghre, an instantiation of EHA with manual event generation 
and SAFE, which is applicable to real-world web apps. 

— We evaluate EHAgare in terms of analysis coverage and precision. 

The remainder of this paper is organized as follows. We first explain the 
concrete semantics of event handlers in web apps, describe how existing whole- 
program analyzers handle events in a sound but unscalable manner, and present 
an overview of our approach using concrete code examples in Sect. 2. We describe 
EHA and its prototype implementation in Sect.3 and Sect. 4, respectively. We 
evaluate the EHA instance using real-world web apps in Sect. 5, discuss related 
work in Sect.6, and conclude in Sect. 7 with future work. 


2 Analyses of Event Handlers 


2.1 Event Handlers in Web Apps 


Web apps may receive events from their execution environments like browsers 
or from users'. When a web app receives an event, it reacts to the event by 
executing JavaScript code registered as a handler (or a listener) of the event. 
An event handler consists of three components: an event target, an event type, 
and a callback function. An event target may be any DOM object like Element, 
window, and XMLHttpRequest. An event type is a string representation of the event 
action type such as "load", "click", and "keydown". Finally, a callback function 
is a JavaScript function to be executed when its corresponding event occurs. 


1 http://www.w3schools.com/js/js_events.asp. 
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Fig. 1. (a) A conservative modeling of event control flows (b) Modeling in TAJS [9] 


Users execute web apps by triggering various events, thus we consider 
sequences of events triggered by users as user inputs to web apps. During exe- 
cution, a set of event handlers that can be executed by a user may vary. First, 
because event handlers are dynamically registered to and removed from DOM 
objects, executable event handlers for an event change at run time. For example, 
when a DOM object has only the following event handler registered: 


(A, "click", function f(){ B.addEventListener("click", function g(){}); }) 


if a user clicks the target A, a new event handler becomes registered, which 
makes two handlers executable. Second, changes in DOM states of a web app 
also change a set of executable event handlers for an event. For instance, an 
event target may be removed from document via DOM API calls, which makes 
the detached event target inaccessible from users. Also, events may not be cap- 
tured depending on their capturing/bubbling options and CSS style settings of 
visibility or display. In addition, it is a common practice to manipulate CSS 
styles like the following: 


— HTMLElement.style.opacity = 0; 
— HTMLElement.style.zIndex = n; 


to hide an element such as a button under another element, making it inaccessible 
from users. These various features affect event sequences that users can trigger 
and event handlers that are executed accordingly. 


2.2 Analysis of Event Handlers in Whole-Program Analyzers 


Most existing whole-program JavaScript analyzers handle event handlers in a 
sound but unscalable manner as illustrated in Fig. 1(a). They first analyze top- 
level code that is statically available in a given web app; event handlers may be 
registered during the analysis of top-level code. Then, after the “exit block of 
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top-level code” node, they analyze code initiated by event handlers in any order 
as denoted by the “trigger all event handlers” node in any number of times. 
According to this modeling of event control flows, all possible event sequences 
that occur after loading the top-level code are soundly analyzed. Note that even 
though whole-program analyzers use this sound event modeling, the analyzers 
themselves may not be sound because of other features like dynamic code gener- 
ation. However, because registered event handlers may be removed during eval- 
uation and they may be even inaccessible due to some CSS styles as discussed 
in Sect. 2.1, the event modeling in Fig.1(a) may contain too many infeasible 
event sequences that are impossible in concrete executions. Analysis with lots of 
infeasible event sequences involves unnecessary computation that wastes anal- 
ysis time, and often results in imprecise analysis results. Such a conservative 
modeling of event control flows indeed reports many false positives [16]. 

To reduce the amount of infeasible event sequences to analyze, TAJS uses 
a refined modeling of event control flows as shown in Fig. 1(b). Among various 
event handlers, this modeling distinguishes “load event handlers” and analyzes 
them before all the other event handlers. While this modeling is technically 
unsound because non-load events may precede load events [15], most web apps 
satisfy this modeling in practice. Moreover, because load event handlers often 
initialize top-level variables, the event modeling in Fig. 1(a) often produces false 
positives by analyzing non-load event functions before load event functions ini- 
tialize top-level variables. On the contrary, the TAJS modeling reduces such 
false positives by analyzing load event handlers before non-load event handlers. 
Although the TAJS modeling distinguishes a load event, the over-approximation 
of the other event handler calls still brings analysis precision and scalability 
issues. 


2.3 Analysis of Event Handlers in EH-Based Analyzers 


To alleviate the analysis precision and scalability problem due to event modeling, 
we propose the EHA framework, which aims to analyze a subset of execution flows 
within a limited time budget to detect bugs in partial execution flows rather 
than to analyze all execution flows. EHA presents two key points to achieve the 
goal. First, it slices the entire execution flows by using each event handler as an 
individual entry point, which amounts to consider a given web app as a collection 
of smaller web apps. This slicing brings the effect of breaking the loop structures 
in existing event modelings shown in Fig. 1. Second, in order to analyze sliced 
event control flows in various contexts, EHA constructs an initial abstract heap 
of each entry point that contains necessary information to analyze a given event 
control flow by abstracting dynamically collected states. More specifically, EHA 
takes two components—a dynamic event generator and a static analyzer—and 
collects concrete values of non-local variables of event functions via the dynamic 
event generator, and abstracts the collected values using the static analyzer. 
Let us compare static, dynamic, and EH-based analyses with an example. We 
assume that a top-level code registers three event handlers: l, a, and b where l 
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denotes a load event handler, which precedes the others and runs once. In addi- 
tion, a and b simulate a pop-up and its close button, respectively. Thus, we can 
represent possible event sequences as a regular expression: l(ab)*a?. For a given 
event sequence lababa, Fig. 2 represents the event flows analyzed by each analy- 
sis technique. A conservative static analysis contains infeasible event sequences 
like the ones starting with a or b, whereas a dynamic analysis covers only short 
prefixes out of infinitely many flows. The EH-based analysis slices the web app 
into three handler units: l, a, and b. Hence, there is no loop in the event model- 
ing; each handler considers every prefix of the given event sequence that ends with 
itself. For example, the handler a considers la, laba, and lababa as possible event 
sequences. Moreover, instead of abstracting the evaluation result of each sequence 
separately and merging them, it first merges the evaluation result of each sequence 
just before the handler a—l, lab, and labab—and uses its abstraction as the initial 
heap of analyzing a, which analyzes more event flows. 


E E € 
a, Bin Bios: 
z l 1 1 E E 
cD v -¥ LL 
1 a a l (f 
y y Vv Vv y 
noop z b b a a 
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Fig. 2. Event flows analyzed by (a) static, (b) dynamic, and (c) EH-based analyses. 
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3 Technical Details 


This section discusses the EHA framework, which composes of five phases as 
shown in Fig. 3. Boxes denote modules and ellipses denote data. EHA takes three 
inputs: a web app (Web App) to analyze and find bugs in it, and two modules 
to use as its components—a dynamic event sequence generator (Event Generator) 
and a static analyzer (Static Analyzer). During the first instrumentation phase, 
Instrumentor inserts code that dynamically collects states into the input web app. 
Then, during the execution phase, the Instrumented Web App runs on a browser 
producing Collected States. One of the input module Event Generator repeatedly 
receives states of the running web app and sends user events to it during this 
phase. In the third unit building phase, Unit Web App Builder constructs a small 
Unit Web App for each event handler from Collected States. After analyzing the set 
of Unit Web Apps by another input module Static Analyzer in the static analysis 
phase, Alarm Aggregator summarizes the resulting set of Bug Reports and generates 
a Final Bug Report for the original input Web App in the final alarm aggregation 
phase. We now describe each phase in more detail. 


Inst (h = <head>) = h.addChildFront(<script src="helper" />) 
Inst (function f(---) b) = function f(---){ 
var envid = getNewEnvId(); var nonlocals = {xj:21 ---}; 
pushCallStack(); collectState(nonlocals) ; b; popCallStack(); } 
Inst (return «;) = { var retVal = x; popCallStack(); return retVal; } 
Inst (catch(e){ b }) = { popCallStack(); b} 
Inst (x = e) = x =e; update(2’,x) 
Inst (x ®) = xO; update(2’,z, ©) 
Inst (® x) = @®2; update(2’,x) 


Fig. 4. Instrumentation rules (partial) 


Instrumentation Phase. The first phase instruments a given web app so that the 
instrumented web app can record dynamically collected states during execution. 
Figure 4 presents the instrumentation rules for the most important cases where 
the unary operator © is either ++ or --. For presentation brevity, we abuse the 
notation and write x’ to denote the string representation of a variable name z. 
The Jnst function converts necessary JavaScript language constructs to others 
that perform dynamic logging. For example, for each function declaration of f, 
Inst inserts four statements before the function body and one statement after 
the function body to keep track of non-local variables of the function f. 


Execution Phase. The execution phase runs an instrumented web app on a 
browser using events generated by Event Generator. Because EHA is parameter- 
ized by the input Event Generator, it may be an automated testing tool or manual 
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efforts. The following definitions formally specify the concepts being used in the 
execution phase and the rest of this section: 


Execution o € S* State s€S=PxH ProgramPoint p€ P 
Heap hEeH=A—-O Address Qr € A Object O=F-V 
Field cer Value V=V,wA Primitive Value Vo 


An execution of a web app ø is a sequence of states that are results of evaluation 
of the web app code. We omit how states change according to the evaluation of 
different language constructs, but focus on which states are collected during exe- 
cution. A state s is a pair of a program point p denoting the source location of the 
code being evaluated and a heap h denoting a memory status. A heap is a map 
from addresses to objects. An address is a unique identifier assigned whenever an 
object is created, and an object is a map from fields to values. A field is an object 
property name and a value is either a primitive value or an address that denotes 
an object. For presentation brevity, we abuse Object to represent Environment as 
well, which is a map from variables to values. Then, EHA collects states at event 
callback entries during execution: 


Collected States(o) = {s | s € o s.t. s is at an event callback entry} 


the program points of which are function entries and the call stack depths are 1. 


Unit Building Phase. As shown in Fig.3, this phase constructs a set of sliced 
unit web apps using dynamically collected states. More specifically, it divides 
the collected states into EH units, and then for each EH unit u, it constructs 
an initial summary sf that contains merged values about non-local variables 
from the states in u. As discussed in Sect. 2.1, an event handler consists of three 
components: an event target, an event type, and a callback function. Thus, we 
design an LH unit u with an abstract event target ¢, an event type T, and a 
program point p: 


weU = AbsEventTarget x EventType x P 
@ € AbsEvent Target = DOMTreePosition J A 
T E€ EventType 


While we use the same concrete event types and program points for EHs, we 
abstract concrete event targets to maintain a modest number of event targets. We 
assume the static analyzer expresses analysis results as summaries. A summary 
S$ is a map from a pair of a program point and a context to an abstract heap: 


ê c = P x Context — H c € Context 


where Context is parameterized by an input static analyzer of EHA. 

For each dynamically collected state s = (p, h) with an event target o and 
an event type T both contained in h, Unit Web App Builder calculates an EH unit 
u as follows: 


u = as(s) = (ao(0), T, p) 

DOMTreePosition(o) if o is attached on DOM 
where a,(0) = f 

o otherwise 
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where DOMTreePosition(o) represents the DOM tree position of o in terms of 
sequences of child indices from the root node of DOM. Then, it constructs an 
initial summary for each unit u, 87, as follows: 


GG: a= pinit if p is the global entry point A c= € 
is ly otherwise 


The initial summary maps all pairs of program points and contexts to the heap 
bottom Ly denoting no information, but it keeps a single map from a pair of the 
global entry program point and the empty context e to the initial abstract heap 
hirit = | |; an(hi) where s; € Collected States A @s(si) =u A si = (pi, hi). The 
initial abstract heap for a unit u is a join of all abstraction results of the heaps 
in the collected states that are mapped to the same u. The heap abstraction ap, 
and the abstract heap join | | are parameterized by the input static analyzer. 


Static Analysis Phase. Now, the static analysis phase analyzes each sliced unit 
web app one by one, and detects any bugs in it. Let us call the static analyzer 
that EHA takes as its input SA. Without loss of generality, let us assume that SA 
performs a whole-program analysis to compute the analysis result Sinai with the 
initial summary §; by computing the least fixpoint of a semantics transfer func- 
tion Ê: 8¢inal = leastFix AS.(81 Ug F(8)) and then reports alarms for possible 
bugs in it. We call an instance of EHA that takes SA as its input static analyzer 
EHAsa. Then, for each EH unit u, EHAsa performs an EH-based analysis to com- 
pute its analysis result $#,,,, with the initial summary 87 constructed during the 
unit building phase by computing the least fixpoint of the same semantics transfer 
function Ê: ê% a1 = leastFix A8.(8¥ Ug F°(8)). It also reports alarms for possible 
bugs in each unit u. 


Alarm Aggregation Phase. The final phase combines all bug reports from sliced 
unit web apps and constructs a final bug report. Because source locations of bugs 
in a bug report from a unit web app are different from those in an original input 
web app, Alarm Aggregator resolves such differences. Since a single source location 
in the original web app may appear multiple times in differently sliced unit web 
apps, Alarm Aggregator also merges bug reports for the same source locations. 


4 Implementation 


This section describes how we implemented concrete data representation and 
each module in dark boxes in Fig. 3 in our prototype implementation. 


Instrumentor. The main idea of instrumentor is similar to that of Jalangi [20], 
a JavaScript dynamic analysis framework, and we implemented the rules 
(partially) shown in Fig. 4. An instrumented web app collects states during exe- 
cution by stringifying them and writing them on files. Dynamically collected infor- 
mation may be ordinary JavaScript values or built-in objects of JavaScript engines 
or browsers, which are often implemented in non-JavaScript, native languages. 
Because such built-in values are inaccessible from JavaScript code, we omit their 
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values in the collected states. On the contrary, ordinary JavaScript values are 
stringified in JSON format. A primitive value is stringified by JSON.stringify and 
stored in ValueMap. An object value is stored in two places—its pointer in Storage 
and its pointer identifier in ValueMap—and its property values are also recur- 
sively stringified and stored in StorageMap. The stringified document, ValueMap, and 
StorageMap are written in files at the end of execution, and Unit Web App Builder con- 
verts them to states in the unit building phase. 


var DOMTokenList = function (){j;....  /*-—modeling code for built-in objects 
var _obj1 = function _handler(){...}; ... object declaration 


if (_BoolTop) { _obj3/prop} = ...; } 
else { _obj3[prop] = ...; } object property initialization 


var _handler =_obj1; 


var _target = _obj2; 


A «— variable declaration/initialization 
var _argument = _obj3; 


handler.apply(_target, argument); — callback function call 


Fig. 5. Contents in a JavaScript file of a unit web app 


Unit Web App Builder. In our prototype implementation, the unit web app 
builder parses the collected states as in JSON format and constructs a unit web 
app as multiple HTML files and one JavaScript file. A single JavaScript file 
contains all the information to build an initial abstract heap as Fig.5. It con- 
tains modeling code for built-in objects on the top, declares objects recorded in 
StorageMap and initializes their properties, and then declares and initializes non- 
local variables, which are all the information needed to build an initial abstract 
heap. At the bottom, the handler function is being called. 

Starting from the above 3 variables—_handler, _target, and -arguments— 
we can fill in contents of a unit web app using the collected states. For each 
variable, we get its value from the collected states and construct a corresponding 
JavaScript code. When the value of a variable is a primitive value, create a 
corresponding code fragment as a string literal. For an object value, get the 
value from StorageMap using its pointer id, and repeat the process for its property 
values. For a function object value, repeat the process for its non-local variables. 


Alarm Aggregator. The alarm aggregator maintains a mapping between different 
source locations and eliminates duplicated alarms. It should map between loca- 
tions in the original web app and in sliced unit web apps. Our implementation 
keeps track of corresponding AST nodes in different web apps, and utilizes the 
information for mapping locations. It identifies duplicated alarms by string com- 
parison of their bug messages and locations after mapping the source locations. 
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5 Experimental Evaluation 
In this section, we evaluate EHAGarr, an instantiation of EHA with manual event 
generation and SAFE [12], to answer the following research questions: 


In the case of providing dynamic events as many as possible, 


— RQ1. Full Coverage: How many event flows does the EH-based analysis 
cover compared with the whole-program analysis? 

— RQ2. Precision: How precise is the ÆH-based analysis compared with the 
whole-program analysis? 

— RQ3. Scalability: What is the execution time of each phase in the analyses? 

— RQ4. Partial Coverage: How many event flows does the EH-based analysis 
cover for timeout analyses? 


5.1 Experimental Setup 


We studied 8 open-source game web apps [8], which were used in the evaluation 
of SAFE. They have various buttons and show event-dependent behaviors. The 
first two columns of Table1 show the names and lines of code of the apps, 
respectively. The first four apps do not use any JavaScript libraries, and the 
remaining apps use the jQuery library version 2.0.3. They are all cross-platform 
apps that can run on Chrome, Chrome-extension, and Tizen environments. 

To perform experiments, we instantiated EHA with two inputs. As an 
Event Generator input, we chose manual event generation by one undergraduate 
researcher who was ignorant of EHA. He was instructed to explore behaviors of 
web apps as much as possible, and he could check the number of functions being 
called during execution as a guidance. In order to make execution environments 
simple enough to reproduce multiple times, we collected dynamic states from 
a browser without any cached data. As a Static Analyzer input, we use SAFE 


Table 1. Analysis coverage of SAFE and EHAgkee. 


App LoC | #Analyzed Handler Ftn | #Analyzed Ftn Total 
Id) App name Both | SAFE | EHAS FE Both SAFE | EHA fE 
only | only only | only 

01) HangOnMan 1326 | 20 0 11 67 3 19 89 
02) MakeAMonster 1405 | 22 0 5 63 5 7 75 
03) Mancala 1546 | 28 0 4 67 4 5 76 
04) Rabbit 1403 | 34 0 2 76 22 2 100 
05) Bubblewrap 7220 | - - 8 - - 10 10 
06) CountingBeads 6949 | - - 9 - - 11 11 
07) MemoryGameForOlderKids | 6955 | - - 7 - - 9 9 
08) WordsSwarm 7557 | - - 9 - - 48 48 
Total 34363 | 104 0 55 273 34 111 418 
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because it can analyze the most JavaScript web apps among existing analyz- 
ers via the state-of-the-art DOM tree abstraction [14,15] and it supports a bug 
detector [16]. We ran the apps with Chrome on a 2.9GHz quad-core Intel Core 
i7 with 16GB memory in the execution phase. The other phases are conducted 
on Ubuntu 16.04.1 with intel Core i7 and 32GB memory. 


5.2 Answers to RQs 


Answer to RQ1. For the analysis coverage, we measured the numbers of analyzed 
functions and true positives by SAFE and EHAgare. Because SAFE could not 
analyze 4 apps that use jQuery within the timeout of 72 h, we considered only 
the other apps for SAFE. 

Table 1 summarizes the result of analyzed functions. The 3rd to the 5th 
columns show the numbers of registered event handler functions analyzed by 
both, SAFE only, and EHAghr,e only, respectively. Similarly, the 6th to the 
8th columns show the numbers of functions analyzed by both, SAFE only, and 
EHAsare only, respectively. When we compare only the registered event handler 
functions among all the analyzed functions, EHAgare outperforms SAFE. Even 
though SAFE was designed to be sound, it missed some behaviors. Our investi- 
gation showed that the causes of the unsoundness were due to incomplete DOM 
modeling. For the numbers of analyzed functions, the analyses covered more than 
75% of the functions in common. EHAgare analyzed more functions for the first 
3 subjects than SAFE due to missing event registrations caused by incomplete 
DOM modeling in SAFE. On the other hand, SAFE analyzed more functions for 
the 4th subject because EHAgare missed flows during the execution phase. We 
studied the analysis result of the 4th subject in more detail, and found flows that 
resume previously suspended execution by using cached data in a localStorage 
object. EHAgare could not analyze the flows because it does not contain cached 
data, while SAFE could use a sound modeling of localStorage. Lastly, EHAGare 
did not miss any true positives that SAFE detected, and EHAghr,e could detect 
four more true positives in common functions as shown in Table 2, which implies 
that EHAgkfe analyzed execution flows in those functions that SAFE missed. 
We explain Table 2 in more detail in the next answer. 


Answer to RQ2. To compare the analysis precision, we measured the numbers 
of false positives (FPs) in alarm reports by SAFE and EHAgare. Note that 
true positives (TPs) may not be considered as “bugs” by app developers. For 
example, while SAFE reports a warning when the undefined value is implicitly 
converted to a number because it is a well-known error-prone pattern, it may be 
an intentional behavior of a developer. Thus, TPs denote they are reproducible 
in concrete executions while FPs denote it is impossible to reproduce them in 
feasible executions. Similarly for RQ1, we compare the analysis precision for four 
apps that do not use jQuery. 

Tables 2 and 3 categorize alarms in three categories: alarms reported by both 
SAFE and EHAghrr, alarms in functions commonly analyzed by both, and alarms 
in functions that are analyzed by only one. Table2 shows numbers of TPs and 
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Table 2. Alarms reported by SAFE and EHAghee. 


App Id | Common alarms | Different alarms 
Common functions Different functions 
SAFE EHAS FE SAFE EHAGKFE 
#TP | #FP #TP|#FP) #TP | #FP | #TP | #FP | #TP #FP 
01 1 3 0 10 3 2 0 0 0 2 
02 1 2 0 0 1 8 0 5 0 1 
03 1 3 0 30 0 6 0 0 0 2 
04 3 7 0 1 0 0 0 0 0 0 
05 - - - - - - - - 0 1 
06 - - - - - - - - 0 3 
07 - - - - - - - - 0 1 
08 - - - - - - - - 0 1 
Total |6 15 0 41 4 16 0 5 0 11 
Table 3. False alarms categorized by causes 
Cause Common | Different alarms 
alarms Common functions | Different functions 
SAFE | EHAS FE SAFE | EHAGkFEe 

Infeasible event flow - 40 - 0 - 

ECMAScript 5 1 0 0 0 0 

Object join 0 3 0 0 

Handler unit abstraction |- - 3 - 0 

Omitted property - - 0 - 2 

Absence of DOM model | 14 1 10 9 

Total 15 41 16 5 11 


FPs for each app, and Table 3 further categorizes alarms in terms of their causes. 
Out of 21 common alarms, 6 are TPs and 15 are FPs. Among 15 common FPs, 
14 are due to absence of DOM modeling and 1 is due to the unsupported getter 
and setter semantics. For the functions commonly analyzed by both, they may 
report different alarms because they are based on different abstract heaps. We 
observed that 40 FPs from SAFE are due to the over-approximated event sys- 
tem modeling. Especially, the causes of FPs in the 01 and 03 apps are because 
top-level variables are initialized when non-load event handler functions are 
called, which implies that the event modeling of Fig. 1(b) would have a simi- 
lar imprecision problem. On the contrary, EHAgare reported only 16 FPs mostly 
(10 FPs) due to absence of DOM modeling. The remaining three FPs from object 
joins and three FPs by handler unit abstraction are due to inherent problems 
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of static analysis that merges multiple values losing precision. Finally, for the 
functions analyzed by only one analyzer, all the reported alarms are FPs due to 
absence of DOM modeling and omitted properties in the EHAghre implementa- 
tion. In short, EHASsfe could partially analyze more subjects than SAFE, and it 
improved the analysis precision by finding four TPs and less FPs for commonly 
analyzed functions. Especially, its handler unit abstraction produced three FPs 
which are considerably fewer than 40 FPs from over-approximated event mod- 


eling in SAFE without missing any TPs. 


Answer to RQ3. To compare the analysis scalability, we measured the execution 
time of each phase for the both analyzers as summarized in Table 4. 


Table 4. Execution time (seconds) of each phase for SAFE and EHASkrFe 


Id SAFE EHACKFE 
Total Top-Level | Event Loop | Execution Unit build | Static analysis 
Total #Call | Ave. Total #EH | #TO | Ave. 

01 375.7 8.9 366.8 465.41 | 682 0.68 10.0 33038.4 | 130 9 96.6 
02 282.0 8.2 273.8 252.86 | 135 1.87 6.0 6379.7 | 33 0 70.4 
03 850.2 15.5 834.7 82.70 | 168 0.49 2.0 7894.1 | 43 3 68.8 
04 1276.6 | 325.3 951.3 302.36 | 589 0.51 2.1 16223.9 | 95 7 54.2 
05 x 137.3 x 1713.61 | 151 11.35 | 287.2 66238.5 | 63 | 55 10.4 
06 x 86.9 x 383.08 | 85 4.51 | 221.5 17257.1 | 27 9 146.5 
07 x 119.3 x 2836.05 | 242 11.72 | 348.2 104583.5 | 94 | 87 7.7 
08 x 82.4 x 1074.73 | 146 7.36 | 1158.5 39506.3 | 41 | 32 33.5 
Ave. | 696.1 98.0 606.6 888.85 | 275 3.24 | 254.4 3076.5 | 66 |25 76.0 


For SAFE, we measured the time took for analyses of the entire code, top- 
level code, and event loops: Total = Top-Level + Event Loop. For four subjects 
that do not use any JavaScript libraries, the total analysis took at most 1276.6 s 
among which 951.3s took for analyzing event loops. While SAFE finished ana- 
lyzing the top-level code of the other subjects that use jQuery in 137.3s at the 
maximum, it could not finish analyzing their entire code within the time of 72 
h (259,200 s). 

For EHA sfe, because the maximum execution time of the instrumentation 
phase and the alarm aggregation phase are 10.3s and 4.9s, respectively, much 
smaller than the other phases, the table shows only the other phases. For the 
execution phase, we present the overhead to collect states: 


EHAsare (Execution Phase): Total = #Call x Ave. 


The 6th column presents the numbers of event handler function calls that Event 
Generator executed; each event handler function pauses for 3.24s on average. 
In order to understand the performance overhead due to the instrumentation, 
we measured its slowdown effect by replacing all the instrumented helper func- 
tions with a function with the empty body. With the Sunspider benchmark, 
Jalangi showed x30 slowdown and EHAgare showed x178 slowdown on average. 
We observed that collecting non-local variables for each function incurs much 
performance overhead, and more function calls make more overhead. 
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The unit building phase takes time to generate unit web app code. Our 
investigation showed that the time heavily depends on the size of collected data. 
For the static analysis phase, we measured the analysis time of unit web apps 
except timeout (TO): 


EHAsare (Static Analysis Phase): Total = (#EH — #TO) x Ave. + 1200 x #TO 


We analyzed each unit web app with the timeout of 1200s. While the 02 app 
has no timeout, the 07 app has 87 timeouts out of 94 unit web apps. On average, 
analysis of 38% (25/66) of the unit web apps was timeout. Note that even for 
the first four apps that SAFE finished analysis, EHAgare had some timeouts. 
We conjecture that SAFE finished analysis quickly since it missed some flows 
because of unsupported DOM modeling. By contrast, because EHAgare analyzes 
more flows using dynamically collected data, it had several timeouts. 


Answer to RQ4. To see how many event flows EHAsare covers with a limited 


time budget, let us consider four apps that SAFE did not finish in 72 h from 
Tables 1 and 4. EHAgh¢r finished 19% (42/225) of the units within the timeout of 
1200s as shown in Table 4, and the average analysis time excluding timeouts was 
76.0s. Because it implies that web apps have event flows that can be analyzed in 
about 76s, it may be meaningful to analyze such simple event flows quickly first 
to find bugs in them. Starting with 42 units, EHAGare covered 78 functions as 
shown in Table 1. While SAFE could not provide any bug reports for four apps 


using jQuery, EHAghre reported 6 alarms from the analzyed functions. 


6 Related Work 


Researchers have studied event dependencies to analyze event flows more pre- 
cisely. Madsen et al. [13] proposed event-based call graphs, which extend tra- 
ditional call graphs with behaviors of event handlers such as registration and 
trigger of events. While they do not consider analysis of DOM state changes and 
event capturing/bubbling behaviors, EHA addresses them by utilizing dynami- 
cally collected states. Sung et al. [21] introduced DOM event dependency and 
exploited it to test JavaScript web apps. Their tool improved the efficiency of 
event testing but it has not yet been applied for static analysis of event loops. 

Taking advantage of both static analysis and dynamic analysis is not a new 
idea [5]. For JavaScript analysis, researches tried to analyze dynamic features 
of JavaScript [7] and DOM values of web apps [23,24] precisely. Alimadadi 
et al. [1] proposed a DOM-sensitive change impact analysis for JavaScript web apps. 
JavaScript Blended Analysis Framework (JSBAF) [26] collects dynamic traces of 
a given app, specializes dynamic features of JavaScript like eval calls and reflec- 
tive property accesses utilizing the collected traces. JSBAF analyzes each trace 
separately and combines the results, but EHA abstracts the collected states on 
each EH first and then analyzes the units to get generalized contexts. Finally, Ko 
et al. [11] proposed a tunable static analysis framework that utilizes a light-weight 
pre-analysis. Similarly, our work builds an approximation of selected executions by 
constructing an initial abstract heap utilizing dynamic information, which enables 
to analyze complex event flows although partially. 
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7 Conclusion and Future Work 


Because existing JavaScript static analyzers conservatively approximate event- 
driven flows, even state-of-the-art analyzers often fail to analyze event flows in 
web apps within a timeout of several hours. We present EHA, a bug detection 
framework that performs a novel EH-based static analysis using dynamically 
collected state information. As a general framework, EHA is parameterized by 
a way to generate event sequences and a JavaScript static analyzer. We present 
EHAchre, an instantiation of EHA with manual event generation and the SAFE 
JavaScript static analyzer. Our experimental evaluation shows that the EH- 
based analysis (EHAgare) reduced false positives reported by the whole-program 
analysis (SAFE) due to its over-approximation of the event system modeling. 
Moreover, EHAS sfe finished analyzing partial execution flows of the web apps 
that SAFE failed to analyze within the timeout of 72h. We plan to inspect the 
soundness issues due to the lack of DOM modeling in whole-program analyzers 
with systematic ways via dynamic analyses [3,6,25], and to use an automated 
testing tool as a dynamic event generator instead of the manual generation. 
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Abstract. Architectural design patterns capture architectural design 
experience and provide abstract solutions to recurring architectural 
design problems. Their description is usually expressed informally and it 
is not verified whether the proposed specification indeed solves the orig- 
inal design problem. As a consequence, an architect cannot fully rely 
on the specification when implementing a pattern to solve a certain 
problem. To address this issue, we propose an approach for the speci- 
fication and verification of architectural design patterns. Our approach 
is based on interactive theorem proving and leverages the hierarchical 
nature of patterns to foster reuse of verification results. The following 
paper presents FACTum, a methodology and corresponding specification 
techniques to support the formal specification of patterns. Moreover, it 
describes an algorithm to map a given FACTum specification to a cor- 
responding Isabelle/HOL theory and shows its soundness. Finally, the 
paper demonstrates the approach by verifying versions of three widely 
used patterns: the singleton, the publisher-subscriber, and the black- 
board pattern. 


Keywords: Architectural design patterns 
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Algebraic specification - Configuration traces 


1 Introduction 


Architectural design patterns capture architectural design experience and pro- 
vide abstract solutions to recurring architectural design problems. They are an 
important concept in software engineering and regarded as one of the major 
tools to support an architect in the conceptualization and analysis of software 
systems [1]. The importance of patterns resulted in a panoply of pattern descrip- 
tions in literature [1-3]. They usually consist of a description of some key archi- 
tectural constraints imposed by the pattern, such as involved data types, types 
of components, and assertions about the activation/deactivation of components 
as well as connections between component ports. These descriptions are usually 
highly informal and the claim that they indeed solve a certain design problem 
remains unverified. As a consequence, an architect cannot fully rely on a pat- 
tern’s specification to solve a design problem faced during the development of a 
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new architecture. Moreover, verified pattern descriptions are a necessary precon- 
dition for automatic pattern conformance analyses, since missing assertions in a 
pattern’s specification renders their detection impossible. Compared to concrete 
architectures, architectural design patterns pose several new challenges to the 
specification as well as the verification: 


— Cl: Axiomatic Specifications. Compared to traditional architectural specifi- 
cations, specifications of patterns are usually axiomatic, focusing on a few, 
but important properties. 

— C2: Dynamic Aspects: Pattern specifications usually involve the specification 
of dynamic aspects, such as instantiation of components and reconfiguration 
of connections. 

— C8: Hierarchical Specifications: Pattern specifications usually build on each 
other, i.e., the specification of a pattern may instantiate the specification of 
another pattern. 


This is why traditional techniques for the specification and verification of con- 
crete architectures are not well-suited to be applied for the specification and 
verification of patterns. 

Therefore, we propose an approach for the formal specification and verifi- 
cation of architectural design patterns which is based on interactive theorem 
proving [4]. Our approach is built on top of a pre-existing model of dynamic 
architectures [5,6] and its formalization in Isabelle/HOL [7] which comes with 
a calculus to support reasoning about such architectures [8]. Our approach pro- 
vides techniques to specify patterns and corresponding design problems and 
allows to map a specification to a corresponding Isabelle/HOL theory [9]. The 
theory and the corresponding calculus can then be used to verify that a specifi- 
cation indeed solves the design problem the pattern claims to solve. 

With this paper, we elaborate on our previous work by providing the follow- 
ing contributions: First, we present FACTum, a novel approach for the formal 
specification of architecture design patterns. Second, we provide an improved 
version of the algorithm to map a given FACTum specification to a correspond- 
ing Isabelle/HOL theory and show soundness of the mapping. Third, we demon- 
strate the approach by specifying and verifying versions of three architectural 
design patterns: the singleton pattern, the publisher subscriber pattern, and the 
blackboard pattern. 

The remainder of the paper is structured as follows: In Sect.2, we provide 
necessary background on interactive theorem proving and configuration traces 
(our model of dynamic architectures). We then describe our approach to specify 
patterns in Sect.3. To this end, we define the notion of (hierarchical) pattern 
specification and demonstrate it by specifying three architectural design pat- 
terns. In Sect. 4, we first define the semantics of a pattern specification in terms 
of configuration traces. Then, we provide an algorithm to map a given speci- 
fication to a corresponding Isabelle/HOL theory and show its soundness, i.e., 
that the semantics of a specification is indeed preserved by the algorithm. We 
proceed with an overview of related work in Sect.5 and conclude the paper in 
Sect. 6 with a brief discussion about how the approach addresses the challenges 
C1-C8 identified above. 
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2 Background 


In the following, we provide some background on which our work is build. 


2.1 Interactive Theorem Proving 


Interactive theorem proving (ITP) is a semi-automatic approach for the devel- 
opment of formal theories. Therefore, a set of proof assistants [4] have been 
developed to support a human in the development of formal proofs. Since our 
approach is based on Isabelle/HOL [9], in the following we describe some relevant 
features about this specific prover. 

In general, Isabelle is an LCF-style [10] theorem prover based on Standard 
ML. It provides a so-called meta-logic on which different object logics are based. 
Isabelle/HOL is one of them, implementing higher-order logic for Isabelle. It 
integrates a prover IDE and comes with an extensive library of theories from 
various domains. New theories are then developed by defining terms of a certain 
type and deriving theorems from these definitions. Data types can be speci- 
fied in Isabelle/HOL in terms of freely generated, inductive data type defini- 
tions [11]. Axiomatic specification of data types is also supported in terms of 
type classes [12]. To support the specification of theories over the data types, 
Isabelle/HOL provides tools for inductive definitions and recursive function def- 
initions. Moreover, Isabelle/HOL provides a structured proof language called 
Isabelle/Isar [13] and a set of logical reasoners to support the verification of the- 
orems. Modularization of theories is achieved through the notion of locales [14] 
in which an interface is specified in terms of sets of functions (called parameters) 
with corresponding assumptions about their behavior. Locales can extend other 
locales and may be instantiated by concrete definitions of the corresponding 
parameters. 


2.2 A Model of Dynamic Architectures 


Since architectures implementing an ADP may be dynamic as well (in the sense 
that components of a certain type can be instantiated over time), our approach 
is based on a model of dynamic architectures. One way to model such architec- 
tures is in terms of sets of configuration traces [5,6], i.e., streams [15,16] over 
architecture configurations. Thereby, architecture configurations can be thought 
of as snapshots of the architecture during execution. Thus, they consist of a set 
of (active) components with their ports valuated by messages and connections 
between the ports of the components. Moreover, components of a certain type 
may be parametrized by a set of messages. 


Example 1 (Configuration trace). Assuming that A,...,Z and 1,...,9 are mes- 
sages. Figure 1 depicts a configuration trace t with corresponding architecture 
configurations t(0) = ko, t(1) = ki, and t(2) = kə. Architecture configuration 
kı, for example, consists of two active components named cı and c2. Thereby, 
component cı is parametrized by {A}, has one input port ip valuated with {8}, 
and three output ports 09, 01, 02, valuated with {1}, {G}, and {7}. 
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Fig. 1. Configuration trace with its first three architecture configurations. 


Note that the model allows components to be valuated by a set of messages, 
rather than just a single message, at each point in time. To evaluate the behavior 
of a single component, the model comes with an operator IT,(t) to extract the 
behavior of a single component c out of a given configuration trace t. 

The model of configuration traces is also implemented by a correspond- 
ing Isabelle/HOL theory which is available through the archive of formal 
proofs [7]. The implementation formalizes a configuration trace as a function 
trace = nat — cnf and provides an interface to the model in terms of a locale 
“dynamic_component”. The locale can be instantiated with components of a 
dynamic architecture by providing definitions for two parameters: 


— tCMP: id x cnf — cmp: an operator to obtain a component cmp with a 
certain identifier 7d from an architecture configuration cnf, and 

— active: id x cnf — bool: a predicate to assert whether a certain component 
with identifier id is activated within an architecture configuration cnf. 


For each dynamic component instantiating the locale, a set of definitions is 
provided to support the specification of its behavior [17]. Moreover, a calculus to 
reason about the behavior of the component in a dynamic context is provided [8]. 


3 Specifying Architectural Design Patterns 


In the following, we describe FACTum, an approach to specify architectural 
design patterns. Therefore, we first provide a definition of the different parts of a 
pattern specification and then we explain each part in more detail. We conclude 
the section with an exemplary specification of three patterns: the singleton, 
the publisher subscriber, and the blackboard pattern. Thereby, the publisher 
component is modeled as an instance of the singleton and the blackboard pattern 
is specified as an instance of the publisher subscriber pattern. 


Definition 1 (Pattern specification). A pattern specification is a 5-tuple 
(VAR, DS, IS, CT, AS), consisting of: 
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- Variables VAR = (V,V’,C,C’) with 
e data type variables V and so-called rigid data type variables V’ (variables 
with a fixed interpretation during execution) and 
e component variables C and rigid component variables C”. 
- A datatype specification DS = (X, DA, Gen) with 
e a signature X = (S,F,B), containing sorts S and function/predicate 
symbols F/B for a pattern’s data types, 
e a set of data type assertions DA specifying the meaning of the signature 
symbols in terms of a set of axioms, and 
e a set of generator clauses Gen to construct data types. 
— An interface specification IS = (P, tp, IF) with 
e a set of ports P and corresponding type function tp: P — S which assigns 
a sort to each port, 
e a set of interfaces (CP, IP, OP) € IF with input ports IP C P and output 
ports OP C P, as well as a set of configuration parameters CP C P. 
- A component type specification (CTif)iserr which assigns assertions CT if 
about the behavior of a component to each interface if € IF. 
— A set of architectural assertions AS, which specify activation and deactivation 
of components and connections between the component’s ports. 


Since a pattern specification may also instantiate other pattern specifications, 
we require that for each instantiated pattern (VAR’, DS’, IS’, CT’, AS’), the 
specification contains an additional port instantiation (ni)yvcrr’, with injective 
functions ny: CP’ U IP' U OP’ — CP UIP U OP, such that n: (CP) C CP, 
ni (IP’) C IP, and n(OP’) C OP, for some (CP, IP, OP) € IF. Thereby, we 
require that for each (C P', IP',OP') € IF’ and p' € CP’ U IP’ U OP’ the cor- 
responding data type refines the type of p’, i.e., that tp(nj(p’)) refines (tp’(p’)). 

In the following, we explain the different parts of a FACTum specification in 
more detail. 


3.1 Specifying Data Types 


The data types involved in a pattern specification can be specified using alge- 
braic specification techniques [18,19]. Algebraic specifications usually consist of 
two parts: First, a signature X = (S, F, B), specifying a set of sorts S and func- 
tion/predicate symbols F/B, typed by a list of sorts. In addition, an algebraic 
specification provides a set of axioms DA to assign meaning to the symbols of 
X. These axioms specify the characteristic properties of the data types used 
by a pattern specification and are formulated over the symbols of F and B, 
respectively. Finally, a data type specification may require that all elements of 
the corresponding type are constructed by corresponding constructor terms Gen, 
i.e., that each element of the corresponding type is build up from symbols of Gen. 
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3.2 Specifying Interfaces 


The specification of interfaces proceeds then in two steps: First, ports are spec- 
ified by providing a set of ports P and a corresponding mapping tp: P — S to 
specify which types of data may be exchanged through each port. Then, a set 
of interfaces (CP, IP, OP) is specified by declaring input ports IP C P, output 
ports OP C P, and a set of configuration parameters CP C P. Thereby, config- 
uration parameters are a way to parametrize components of a certain type and 
they can be thought of as ports with a predefined value which is fixed for each 
component. 

Interfaces can then be specified using so-called configuration diagrams con- 
sisting of a graphical depiction of the involved interfaces (see Sect.3.6 for exam- 
ples). Thereby, each interface consists of two parts: A name followed by a list of 
configuration parameters (enclosed between ‘(’ and ‘)’). Input and output ports 
are represented by empty and filled circles, respectively. 


3.3 Specifying Component Types 


Component types are specified by assigning assertions about the input/output 
behavior to the interfaces. Thereby, configuration parameters can be used to 
distinguish between different components of a certain type. 

The assertions are expressed in terms of linear temporal logic equations [20] 
formulated over the signature X by using port names as free variables. For 
example, the term “O(c.p = POS — c.o > 1)” denotes an assertion that port 
o of component c, for which configuration parameter p has the value POS (for 
positive), is guaranteed to be greater or equal to 1 for the whole execution of 
the system. 


3.4 Specifying Activation and Connection Assertions 


Finally, a set of assertions about the activation and deactivation of components 
as well as assertions about connections between component ports are specified. 
Both types of assertions may be expressed in terms of so called configuration 
trace assertions, i.e, linear temporal logic formulæ with special predicates to 
denote activation of components and port connections. Thereby, c.p denotes the 
valuation of port p of a component c (where ¢-p denotes that port p of component 
c is valuated, at all), ||c|] denotes that a component c is currently active, and 
c.p ~> c'.p' denotes that output port p of component c is connected to input port 
p’ of component c. 


3.5 Specifying Pattern Instantiations 


As described above, pattern specifications may be built on top of other pat- 
tern specifications by instantiating their component types. Such instantiations 
can be directly specified in a pattern’s configuration diagram by annotating the 
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Diagram Singleton ASpec Singleton for Singleton 
var c: Singleton 
Tig Ch Lnn Singleton 

Singleton (ac: | cll) (1) 

Ac’: ( (Ve: (llel — c= c'))) (2) 


(a) Configuration diagram. (b) Activation specification. 


Fig. 2. Specification of the singleton pattern. 


Diagram Publisher-Subscriber 


DT ipti id, evt 
import Singleton Spec subscription(id,evt) 


generated by sub id evt, unsub id evt 


(b) Data type specification. 


PSpec Publisher-Subscriber 


Publisher : 


sb: subscription(id, g(evt)) 
nt: evt X msg 


(a) Configuration diagram. (c) Port specification. 


Fig. 3. Specification of the publisher subscriber pattern. 


corresponding interfaces. To denote that a certain component type t of the spec- 
ification is an instance of component type t’ (from the instantiated pattern), we 
simply write t : t followed by a corresponding port mapping |p;, p, — Pi, Pol, 
which assigns a port of t to each port of t’. 


3.6 Example: An Initial Pattern Hierarchy 


In the following, we demonstrate the FACTum approach by specifying variants 
of three well-known patterns: the singleton pattern, the publisher subscriber 
pattern, and the blackboard pattern. Thereby, the publisher component of the 
publisher subscriber pattern is modeled as an instance of the singleton, whereas 
the blackboard pattern is specified by instantiating the publisher subscriber 
pattern. 


Singleton. The singleton pattern is a pattern for dynamic architectures in 
which, for a certain type of component, it is desired to have only one active 
instance at all points in time. Figure 2 depicts a possible specification of the pat- 
tern in terms of a configuration diagram and a corresponding activation speci- 
fication. Since the pattern is only concerned with activation of components, we 
do neither have data types, nor port specifications for that pattern. 
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Interfaces. The interface is specified by the configuration diagram in Fig. 2a: It 
consists of a single interface Singleton and does not require any special ports. 


Architectural Assertions. Activation assertions are formalized by the specifica- 
tion depicted in Fig. 2b: With Eq. 1 we require that there exists a component c 
which is always activated and with Eq. 2 we require the component to be unique. 
In our version of the singleton, we require that the singleton component is not 
allowed to change over time. This is why variable c is declared to be rigid in 
Fig. 2b. Indeed, other versions of the singleton are possible in which the single- 
ton may change over time. 


Publisher Subscriber. We now proceed by specifying a version of the pub- 
lisher subscriber pattern. Such patterns are used for architectures in which so- 
called subscriber components can subscribe for certain messages from other, 
so-called publisher components. Figure 3 depicts a possible specification of the 
pattern in terms of a data type specification, port specification, and correspond- 
ing configuration diagram. 


Data Types. In a publisher subscriber pattern we usually have two types of 
messages: subscriptions and unsubscriptions. Figure 3b depicts the correspond- 
ing data type specification. Subscriptions are modeled as parametric data types 
over two type parameters: a type id for component identifiers and some type evt 
denoting events to subscribe for. The data type is freely generated by the con- 
structor terms “sub id evt” and “unsub id evt”, meaning that every element 
of the type has the form “sub id evt” or “sub id evt”. 


Ports. Two port types are specified over these data types by the specification 
given in Fig.3c: a type sb which allows to exchange subscriptions to a specific 
event and type nt which allows to exchange messages associated to any event. 


Interfaces. The configuration diagram depicted in Fig. 3a depicts the specifica- 
tion of the interfaces of the two types of components: An interface Publisher is 
defined with an input port sb to receive subscriptions and an output port nt to 
send out notifications. Moreover, an interface Subsciber is defined with an input 
port nt receiving notifications and an output port sb to send out subscriptions. 
As stated in the beginning, we want a publisher to be unique and activated which 
is why it is specified as Publisher: Singleton, meaning that it is considered to be 
an instance of the Singleton type of the specification of the singleton pattern. 


Architectural Assertions. Activation assertions for publisher subscriber architec- 
tures are mainly inherited from the singleton pattern: since a publisher is spec- 
ified to be a singleton, a publisher component is unique and always activated. 
Moreover, two connection assertions for publisher subscriber architectures are 
specified in Fig. 4: Eq. (3) requires a publisher’s input port sb to be connected 
to the corresponding output port of every active subscriber which sends some 
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ASpec Publisher-Subscriber for Publisher-Subscriber 
var s Subscriber 
p: Publisher 

m: msg 

E: plevt) 

rig a": Subscriber 
easy Be a a A A NAN S 
(iip A ||s|| A s.8b — p.sb ~ s.sb) (3) 
||s"|| A (AE: sub s’ E € s'.sb^e € FE) (4) 


—+((lIpIIAlls"IA (em) Ep-nt—s'.nt»p-nt) W (\|[s’||A(GE: unsub s’ Bes! .sb\e€E)) ) 


ed 


Fig. 4. Architectural constraints for the blackboard pattern. 


message. Equation (4), on the other hand, requires a subscriber’s input port nt 
to be connected to the corresponding output port of the publisher, whenever the 
latter sends a message for which the subscriber is subscribed. 


Blackboard. We conclude our example by specifying a dynamic version of the 
blackboard pattern. A blackboard architecture is usually used for the task of 
collaborative problem solving, i.e., a set of components work together to solve 
an overall, complex problem. Our specification of the pattern is depicted in Fig. 5 
and consists of a data type specification, port specification, and corresponding 
configuration diagram. 


Data Types. Blackboard architectures usually work with problems and solutions 
for them. Figure5b provides a specification of the corresponding data types. 
We denote by PROB the set of all problems and by SOL the set of all solutions. 
Complex problems consist of subproblems which can be complex themselves. To 
solve a problem, its subproblems have to be solved first. Therefore, we assume the 
existence of a subproblem relation < C PROB x PROB. For complex problems, the 
details of the relation may not be known in advance. Indeed, one of the benefits of 
a blackboard architecture is that a problem can be solved even without knowing 
the exact nature of this relation in advance. However, the subproblem relation 
has to be well-founded (Eq. (5)) for a problem to be solvable. In particular, 
we do not allow for cycles in the transitive closure of <. While there may be 
different approaches to solve a problem (i.e., several ways to split a problem 
into subproblems), we assume, without loss of generality, that the final solution 
for a problem is always unique. Thus, we assume the existence of a function 
solve: PROB — SOL which assigns the correct solution to each problem. Note, 
however, that it is not known in advance how to compute this function and it is 
indeed one of the reasons for using this pattern to calculate this function. 
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Diagram Blackboard DTSpec ProbSol imports SET 
import Publisher-Subscriber rE PROB x PROB 
solve: PROB — SOL 

KS(pb): Subscriber well-founded (<) (5) 


[nt, sb rp, cs] 
; = (b) Data type specification. 


opl 
ns 
PSpec BPort 
a i i eas rp: PROB x p(PROB) 
[sb, nt > rp, cs] ns, cs: PROB x SOL 
op, prob: PROB 
(a) Configuration diagram. (c) Port specification. 


Fig. 5. Specification of the blackboard pattern. 


Ports. In Fig. 5c, we specify 4 ports for the pattern: 


— rp is used to exchange a problem p € PROB which a knowledge source is able 
to solve, together with a set of subproblems P C PROB the knowledge source 
requires to be solved first. 

— ns is used to exchange a problem p € PROB solved by a knowledge source, 
together with the corresponding solution s € SOL. 

— op is used to exchange a set P C PROB of all the problems which still need to 
be solved. 

— cs is used to exchange solutions s € SOL for problems p € PROB. 


Moreover a configuration parameter prob is specified to parametrize knowledge 
source according to the problems p € PROB they can solve. 


Interfaces. A blackboard pattern usually involves two types of components: 
blackboards and knowledge sources. The corresponding interfaces are specified 
by the configuration diagram in Fig. 5a. Since our version of the blackboard pat- 
tern is specified to be an instance of the publisher subscriber pattern, we import 
the corresponding pattern specification in the header of the diagram. We then 
specify two interfaces. The blackboard interface is denoted BB and is declared 
to be an instance of a Publisher component in a publisher subscriber pattern. It 
consists of two input ports rp and ns to receive required subproblems and new 
solutions. Moreover, it specifies two output ports op and cs to communicate cur- 
rently open problems and solutions for all currently solved problems. Thereby, 
port rp is specified to be an instance of port sb of a publisher and port cs to be 
an instance of a publisher’s nt port. 

The interface for knowledge sources is denoted KS and is declared to be 
an instance of a Subscriber component in a publisher subscriber pattern. Note 
that each knowledge source can only solve certain problems, which is why a 
knowledge source is parameterized by a problem “prob”. The specification of 
ports actually mirrors the corresponding specification of the blackboard inter- 
face. Thus, a knowledge source is required to have two input ports op and cs to 
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BSpec Blackboard for BB of Blackboard 
var p: PROB 
P: PROB SET 

rig p': PROB 
gs’: SOL 

(p’, s!) E€ ns — ((p', 8’) E cs) ) (6) 
(p, P) € rp — (YpP' E P: (O(p' € op)))) (7) 

p' € op — (p' € op W (p', solve(p')) € cs (8) 


Fig. 6. Specification of behavior for blackboard components. 


receive currently open problems and solutions for all currently solved problems, 
and two output ports rp and ns to communicate required subproblems and new 
solutions. Thereby, port rp is specified to be an instance of a subscribers nt port 
and port cs to be an instance of a subscribers sb port, respectively. 


Component Types. A blackboard provides the current state towards solving the 
original problem and forwards problems and solutions from knowledge sources. 
Figure 6 provides a specification of the blackboard’s behavior in terms of three 
behavior assertions: 


— Ifa solution s’ to a subproblem p’ is received on its input port ns, then it is 
eventually provided at its output port cs (Eq. 6). 

— If, on its input port rp, it gets notified that solutions for some subproblems 
P are required in order to solve a certain problem p, these problems are 
eventually provided at its output port op (Eq. (7)). 

— A problem p’ is provided at its output port op as long as it is not solved 
(Fa. (8). 


Note that the last assertion (Eq. (8)) is formulated using a weak until operator 


which is defined as follows: y W y def (Wy (U4). 


A knowledge source receives open problems via op and provides solutions for 
other problems via cs. It might contribute to the solution of the original problem 
by solving currently open subproblems. Figure 7 provides a specification of the 
knowledge sources’s behavior in terms of four behavior assertions: 


— If a knowledge source (able to solve a problem pp) requires some subprob- 
lems P to be solved in order to solve pp and it gets solutions for all these 
subproblems p’ on its input port cs, then it eventually solves pp and provides 
the solution on its output port ns (Eq. (9)). 

— To solve a problem pp, a knowledge source requires solutions only for smaller 
problems p € P (Eq. (10)). 

— A knowledge source will eventually communicate its ability to solve an open 
problem pp via its output port rp (Eq. (11)). 

— A knowledge source does not unsubscribe from receiving solutions for sub- 
problems it required until it indeed received these solutions (Eq. (12)). 


160 D. Marmsoler 


BSpec Knowledge Source for ks = KS(pp) of Blackboard 
var H PROB 
P: (PROB) 

rig p': PROB 
V(pp,P) E€ rp: ((Wp' E€ P: %(p', solve(p’)) E cs) — (pp, solve(pp)) € ns) ) (9) 
V(pp, P) € rp: Vp € P: p X pp) (10) 
pp € op — O(AP: (pp, P) € rp)) (11) 
sub ks P = rp (~3P': p € P’ Aunsub ks P' = rp W (p, solvep) € cs) ) (12) 


Fig. 7. Specification of behavior for knowledge source components. 


ASpec Blackboard for Blackboard 
var ks: KS (pp) 
bb: BB 

TIE a a ae ee ee ee ees eee KS (pp) 
\|ks’|| A pp € ks’.op — (Ilks”]| W ||ks’|| A (pp, solve(pp)) € ks'.ns) ) (13) 
\|ks|| A || bb|| A bb.op — ks.op ~> bb.op (14) 

|| bb|| A [|ks|| A ks.ns — bb.ns ~ ks.ns (15) 


Fig. 8. Specification of activation constraints for blackboard architectures. 


Architectural Assertions. Activation constraint for blackboards are mainly inher- 
ited from the singleton pattern: since a blackboard is specified to be an instance 
of a publisher which is again an instance of a singleton, a blackboard component 
is unique and always activated. Activation constraint for knowledge sources are 
provided in Fig.8 by Eq. (13): Whenever a knowledge source (able to solve a 
problem pp) gets notified about a request to solve pp, it stays active until pp 
is indeed solved. Connection assertions for the blackboard pattern are mainly 
inherited from the corresponding specification of the publisher subscriber pat- 
tern (for ports rp and cs, respectively). Two additional assertions, however, are 
provided in Fig. 8: with Eq. 14 we require input ports op of active blackboard 
components to be connected to the corresponding output ports of knowledge 
sources and with Eq. 15 we require a similar property for port ns. 


4 Verifying Architectural Design Patterns 


In the last section we presented FACTum, a methodology and corresponding 
techniques to specify architectural design pattern. Thereby, we relied on an intu- 
itive understanding of the semantics of the techniques. In the following, we first 
provide a more formal definition of the semantics of a FACTum specification. 
Then, we describe an algorithm to map a given specification to a corresponding 
Isabelle/HOL theory and we show soundness of the algorithm. 
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4.1 Semantics of Pattern Specifications 


The semantics of a pattern specification is given in terms of sets of configuration 
traces introduced in Sect. 2. 


Definition 2 (Semantics of Pattern Specification). The semantics of a 
pattern specification (VAR, DS,IS,CT,AS) is given by a 5-tuple (A,P,T,C, 
AT), consisting of: 


- an algebra A= ((Ag)ses (f^) ser, (Pen) for 3, 

— a set of ports P with cardinality greater or equal to the cardinality of P, 

— port typing T: P > p(M) with M = Useg(As), 

— a nonempty set of component identifiers Cig for each component interface 
if € Cry, and 

- an architecture AT € DAS; 


such that for all port interpretations 6: P — P (injective mappings which respect 
tp and T), variable interpretations 1: V > A and w': V' — A, and component 
variable interpretations k: C — C and x’: C! — C (respecting interface types) 
the following conditions hold: 


— A is an algebra for the data type specification: A,u = DS, 

— the projection to the behavior of a component c for every configuration trace 
t of the architecture satisfies the corresponding behavior specification: Vc € 
C£,te AT: I,(tYb = CT ¢, and 

— all configuration traces t of the architecture satisfy the architectural assertions: 
VtE AT: t, V, K H AS. 


4.2 Mapping to Isabelle/HOL 


Algorithm 1 describes how to systematically transfer a pattern specification to 
a corresponding Isabelle/HOL theory. In general, the transformation is done 
in 4 main steps: (i) The specified data types are transferred to corresponding 
Isabelle/HOL data type specifications (ii) An Isabelle locale is created for the 
corresponding pattern which imports other locales for each instantiated pat- 
tern. (iii) Specifications of component behavior are added as assumptions. (iv) 
Activation and connection assertions are provided as assumptions. 

The following soundness criterion guarantees that Algorithm 1 indeed pre- 
serves the semantics of a pattern specification. 


Theorem 1 (Soundness of Algorithm 1). For every pattern specification 
PT, and model T of the Isabelle/HOL locale (as specified in [21]) generated by 
Algorithm 1, there exists a T” such that T' = PT (as defined by Definition 2) 
and T' is isomorphic to T; and vice versa. 


Note that the generated theory is based on Isabelle/HOLs implementation of 
configuration traces |7]. Thus, a calculus is instantiated for each component type 
which provides a set of rules to reason about the specification of the behavior of 
components of that type. 
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Algorithm 1. Mapping a pattern specification to an Isabelle/HOL Theory. 


Input: (VAR, DS,IS, CT, AS) {pattern specification according to Definition 1} 
Output: An Isabelle/HOL theory for the specification 

1: create Isabelle/HOL data type specification for DS 

2: create Isabelle/HOL locale for the pattern 

3: for all Interfaces i = (CP, IP, OP) € IF do 


4 if i instantiates a component of another pattern then 
5 import the corresponding locale 
6: create instance of ports according to 6; 
7: else 
8: import locale “dynamic-component” of theory “Configuration_Traces” [8] 
9: end if 
10: create instance of locale parameters tCMP and active 
11: for all configuration parameters p € CP which are not instances do 
12: create locale parameter p of type tp(p) 
13: create locale assumption “Vx. dc. x = p(c)” 
14: end for 
15: for all ports p € IP U OP which are not instances do 
16: create locale parameter p of type tp(p) 
17: end for 
18: for all behavior assertions b € CT; do 
19: create locale assumption for b using def. of theory “Configuration_Traces” [8] 
20: end for 
21: end for 
22: for all activation/connection assertions c € AS do 
23: create locale assertion for c 
24: end for 


4.3 Example: Pattern Hierarchy 


Algorithm 1 can be used to transfer a given pattern specification to a corre- 
sponding Isabelle/HOL theory where it is subject to formal verification. This 
is demonstrated by applying it to the specification of the singleton, publisher 
subscriber, and blackboard pattern presented in Sect.3.6. To demonstrate the 
verification capabilities, we then proof one characteristic property for each pat- 
tern. The corresponding Isabelle/HOL theory files are provided online [22]. 


Singleton. We first come up with a basic property for singleton components 
which ensures that there exists indeed a unique component of the corresponding 
type which is always activated: 


Ate: O (llel) . (16) 


Publisher Subscriber. Lets now turn to the publisher subscriber pattern. 
First of all, remember that the publisher component was specified to be an 
instance of the singleton pattern which is why all results from the verification 
of the singleton pattern are lifted to the publisher component. Thus, we get 
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an equivalent result as Eq.(16) for free. Moreover, we can use the additional 
assertions imposed by the specification to come up with another property for the 
publisher subscriber pattern which guarantees that a subscriber indeed receives 
all the messages for which he is subscribed: 


(llel A^ sub c E € c.sb — (17) 


((e,m) € p.nt ^ e € E — (e,m) € c.sb) W (unsub c F’ € c.sb ^e € E’). 


Note that the proof of the above property is based on Eq. (16) inherited from the 
singleton pattern. Indeed, the hierarchical nature of FACTum allows for reuse 
of verification results from instantiated patterns. 


Blackboard. Again, the properties verified for singletons (Eq. (16)) as well as 
the properties verified for publisher subscriber architectures (Eq. (17)) are inher- 
ited for the blackboard specification. In the following, we use these properties 
to verify another property for blackboard architectures: A blackboard pattern 
guarantees that if for each open (sub-)problem, there exists a knowledge source 
which is able to solve the corresponding problem: 


(vr € bb'.op: (| ksll)), (18) 


then, it is guaranteed, that the architecture will eventually solve an overall prob- 
lem, even if no single knowledge source is able to solve the problem on its own: 


CG E bb'.rp — O(p', solve(p’)) € bb'.cs). (19) 


5 Related Work 
Related work can be found in three different areas. 


Formal Specification of Architectural Styles. Over the last years, several 
approaches emerged to support the formal specification of architectural design 
patterns. One of the first attempts in this direction was Wright [23] which pro- 
vided the possibility to specify architectural styles which is similar to our notion 
of architectural design pattern. More recent approaches to specify styles are 
based on the BIP framework [24] and provide logics [25] as well as graphical nota- 
tion [26] to specify styles. There are, however, two differences of these approaches 
to the work presented in this paper: One difference concerns the expressive power 
of the specification techniques. While the above approaches focus mainly on the 
specification of patterns for static architectures, we allow for the specification of 
static as well as dynamic architectures. Another difference arises from the scope 
of the work. While the above approaches focus mainly on the specification of 
patterns, our focus is more on the verification of such specifications. 
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Verification of Architectural Styles and Patterns. Recently, some approaches 
emerged which focus on the verification of architectural styles and patterns. 
Kim and Garlan [27], for example, apply the Alloy [28] analyzer to automati- 
cally verify architectural styles specified in ACME [29]. A similar approach comes 
from Wong et al. [30] which applies Alloy to the verification of architectural 
models. Zhang et al. [31] applied model checking techniques to verify architec- 
tural styles formulated in Wright#, an extension of Wright. Similarly, Marmsoler 
and Degenhardt [32] also apply model checking for the verification of design 
patterns. Another approach comes from Wirsing et al. [33] where the authors 
apply rewriting logic to specify and verify cloud-based architectures. While all 
these approaches focus on the verification of architectures and architectural pat- 
terns, they all apply automatic verification techniques. While this has many 
advantages, verification is limited to properties subject to automatic verification. 
Indeed, with our work we actually complement these approaches by providing 
an alternative approach based on, rather than automatic verification techniques. 


Interactive Theorem Proving for Software Architectures. Another area of related 
work can be found in applications of to software architectures in general. Fensel 
and Schnogge [34], for example, apply the KIV interactive theorem prover to ver- 
ify concrete architectures in the area of knowledge-based systems. Their work 
differs from our work in two main aspects. (i) While they focus on the verifi- 
cation of concrete architectures, we propose an approach to verify architectural 
patterns. (ii) While they focus on the verification of static architecture, our 
approach allows for the verification of dynamic architectures. Thus, we com- 
plement their work by providing a more general approach. More recently, some 
attempts were made to apply to the verification of architectural connectors. Li 
and Sun [35], for example, apply the Coq proof assistant to verify connectors 
specified in Reo [36]. With our work we complement their approach since we 
focus on the verification of patterns, rather than connectors. 

To summarize, to the best of our knowledge, this is the first attempt applying 
to the verification of architectural design patterns. 


6 Conclusion 


With this paper we presented a novel approach for the specification and ver- 
ification of architecture design patterns. Therefore, we provide a methodology 
and corresponding specification techniques for the specification of patterns in 
terms of configuration traces. Then, we describe an algorithm to map a given 
specification to a corresponding Isabelle/HOL theory and show soundness of 
the algorithm. Our approach can be used to formally specify patterns in a hier- 
archical way. Using the algorithm, the specification can then be mapped to a 
corresponding Isabelle/HOL theory where the pattern can be verified using a 
pre-existing calculus. This is demonstrated by specifying and verifying versions 
of three architecture patterns: the singleton, the publisher subscriber, and the 
blackboard. Thereby, patterns were specified hierarchical and verification results 
for lower level patterns were reused for the verification of higher level patterns. 
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The proposed approach addresses the challenges for pattern verification iden- 
tified in the introduction as follows: 


C1 Axiomatic C2 Dynamic aspects | C3 Hierarchical 
specifications specifications 
Specification | Model-theoretic Model of dynamic Structured 
semantics architectures specifications 
Verification | Axiomatic reasoning | A calculus to Import of 
support verification | verification results 


In order to achieve our overall vision of interactive, hierarchical pattern ver- 
ification [37], future work is needed in two directions: We are currently working 
on an implementation of the approach for the eclipse modeling framework [38] 
where a pattern can be specified and a corresponding Isabelle/HOL theory can 
be generated using the algorithm presented in the paper. In a second step, we 
want to lift the verification to the architecture level, hiding the complexity of an 
interactive theorem prover and interpreting its output at the architecture level. 


Acknowledgments. We would like to thank Veronika Bauer, Maximilian Junker, and 
all the anonymous reviewers of FASE 2018 for their comments and helpful suggestions 
on earlier versions of this paper. Parts of the work on which we report in this paper 
was funded by the German Federal Ministry of Education and Research (BMBF) under 
grant no. 011s16043A. 


References 


1. Taylor, R.N., Medvidovic, N., Dashofy, E.M.: Software Architecture: Foundations, 
Theory, and Practice. Wiley Publishing, Chichester (2009) 

2. Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: Pattern- 
Oriented Software Architecture: A System of Patterns. Wiley, West Sussex (1996) 

3. Shaw, M., Garlan, D.: Software Architecture: Perspectives on an Emerging Disci- 
pline, vol. 1. Prentice Hall, Englewood Cliffs (1996) 

4. Wiedijk, F. (ed.): The Seventeen Provers of the World. LNCS (LNAI), vol. 3600. 
Springer, Heidelberg (2006). https: //doi.org/10.1007/11542384 

5. Marmsoler, D., Gleirscher, M.: On activation, connection, and behavior in dynamic 
architectures. Sci. Ann. Comput. Sci. 26(2), 187—248 (2016) 

6. Marmsoler, D., Gleirscher, M.: Specifying properties of dynamic architectures using 
configuration traces. In: Sampaio, A., Wang, F. (eds.) ICTAC 2016. LNCS, vol. 
9965, pp. 235-254. Springer, Cham (2016). https://doi.org/10.1007/978-3-319- 
46750-4_14 

7. Marmsoler, D.: Dynamic architectures. Archive of Formal Proofs, pp. 1-65. Formal 
proof development, July 2017 

8. Marmsoler, D.: Towards a calculus for dynamic architectures. In: Hung, D., Kapur, 
D. (eds.) ICTAC 2017. LNCS, vol. 10580. Springer, Cham (2017). https://doi.org/ 
10.1007 /978-3-319-67729-3_6 


166 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 
22. 


23. 


24. 


25. 


26. 


D. Marmsoler 


. Nipkow, T., Wenzel, M., Paulson, L.C. (eds.): Isabelle/HOL: A Proof Assistant 


for Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002). https:// 
doi.org/10.1007/3-540-45949-9 

Gordon, M.J., Milner, A.J., Wadsworth, C.P.: Edinburgh LCF: A Mechanised Logic 
of Computation. LNCS, vol. 78. Springer, Heidelberg (1979). https://doi.org/10. 
1007/3-540-09724-4 

Berghofer, S., Wenzel, M.: Inductive datatypes in HOL — lessons learned in formal- 
logic engineering. In: Bertot, Y., Dowek, G., Théry, L., Hirschowitz, A., Paulin, 
C. (eds.) TPHOLs 1999. LNCS, vol. 1690, pp. 19-36. Springer, Heidelberg (1999). 
https: //doi.org/10.1007/3-540-48256-3_3 

Wenzel, M.: Type classes and overloading in higher-order logic. In: Gunter, E.L., 
Felty, A. (eds.) TPHOLs 1997. LNCS, vol. 1275, pp. 307-322. Springer, Heidelberg 
(1997). https: //doi.org/10.1007/BFb0028402 

Wenzel, M.: Isabelle/Isar - a generic framework for human-readable proof docu- 
ments. In: From Insight to Proof - Festschrift in Honour of Andrzej Trybulec vol. 
10, no. 23, pp. 277-298 (2007) 

Ballarin, C.: Locales and locale expressions in Isabelle/Isar. In: Berardi, S., Coppo, 
M., Damiani, F. (eds.) TYPES 2003. LNCS, vol. 3085, pp. 34-50. Springer, 
Heidelberg (2004). https://doi.org/10.1007/978-3-540-24849-1_3 

Broy, M.: A logical basis for component-oriented software and systems engineering. 
Comput. J. 53(10), 1758-1782 (2010) 

Broy, M.: A model of dynamic systems. In: Bensalem, S., Lakhneck, Y., Legay, 
A. (eds.) ETAPS 2014. LNCS, vol. 8415, pp. 39-53. Springer, Heidelberg (2014). 
https: //doi.org/10.1007/978-3-642-54848-2_3 

Marmsoler, D.: On the semantics of temporal specifications of component-behavior 
for dynamic architectures. In: Eleventh International Symposium on Theoretical 
Aspects of Software Engineering. Springer (2017) 

Broy, M.: Algebraic specification of reactive systems. In: Wirsing, M., Nivat, M. 
(eds.) AMAST 1996. LNCS, vol. 1101, pp. 487-503. Springer, Heidelberg (1996). 
https://doi.org/10.1007/BFb0014335 

Wirsing, M.: Algebraic specification. In: van Leeuwen, J. (ed.) Handbook of The- 
oretical Computer Science, pp. 675-788. MIT Press, Cambridge (1990) 

Manna, Z., Pnueli, A.: The Temporal Logic of Reactive and Concurrent Systems. 
Springer, New York (1992). https://doi.org/10.1007/978-1-4612-0931-7 

Wenzel, M., et al.: The Isabelle/Isar reference manual (2004) 

Marmsoler, D.: Isabelle/HOL theories for the singleton, publisher subscriber, and 
blackboard pattern. http://www.marmsoler.com/docs/FASE18 

Allen, R.J.: A formal approach to software architecture. Technical report, DTIC 
Document (1997) 

Attie, P., Baranov, E., Bliudze, S., Jaber, M., Sifakis, J.: A general framework for 
architecture composability. Form. Asp. Comput. 28(2), 207-231 (2016) 
Mavridou, A., Baranov, E., Bliudze, S., Sifakis, J.: Architecture diagrams: a graph- 
ical language for architecture style specification. In: Bartoletti, M., Henrio, L., 
Knight, S., Vieira, H.T. (eds.) Proceedings of the 9th Interaction and Concurrency 
Experience. ICE 2016, Heraklion, 8-9 June 2016. EPTCS, vol. 223, pp. 83-97 
(2016) 

Mavridou, A., Baranov, E., Bliudze, S., Sifakis, J.: Configuration logics: mod- 
elling architecture styles. In: Braga, C., Olveczky, P.C. (eds.) FACS 2015. LNCS, 
vol. 9539, pp. 256-274. Springer, Cham (2016). https://doi.org/10.1007/978-3-319- 
28934-2_14 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


Hierarchical Specification and Verification of Architectural Design Patterns 167 


Kim, J.S., Garlan, D.: Analyzing architectural styles with alloy. In: Proceedings 
of the ISSTA 2006 Workshop on Role of Software Architecture for Testing and 
Analysis, pp. 70-80. ACM (2006) 

Jackson, D.: Alloy: a lightweight object modelling notation. ACM Trans. Softw. 
Eng. Methodol. (TOSEM) 11(2), 256-290 (2002) 

Garlan, D.: Formal modeling and analysis of software architecture: components, 
connectors, and events. In: Bernardo, M., Inverardi, P. (eds.) SFM 2003. LNCS, 
vol. 2804, pp. 1-24. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3- 
540-39800-4_1 

Wong, S., Sun, J., Warren, I., Sun, J.: A scalable approach to multi-style architec- 
tural modeling and verification. In: Engineering of Complex Computer Systems, 
pp. 25-34. IEEE (2008) 

Zhang, J., Liu, Y., Sun, J., Dong, J.S., Sun, J.: Model checking software architec- 
ture design. In: High-Assurance Systems Engineering, pp. 193-200. IEEE (2012) 
Marmsoler, D., Degenhardt, S.: Verifying patterns of dynamic architectures using 
model checking. In: Proceedings of the International Workshop on Formal Engi- 
neering approaches to Software Components and Architectures, FESCA@ETAPS 
2017, Uppsala, Sweden, 22 April 2017, pp. 16-30 (2017) 

Wirsing, M., Eckhardt, J., Mühlbauer, T., Meseguer, J.: Design and analysis of 
cloud-based architectures with KLAIM and Maude. In: Duran, F. (ed.) WRLA 
2012. LNCS, vol. 7571, pp. 54-82. Springer, Heidelberg (2012). https://doi.org/10. 
1007 /978-3-642-34005-5_4 

Fensel, D., Schnogge, A.: Using KIV to specify and verify architectures of 
knowledge-based systems. In: Automated Software Engineering, pp. 71-80, 
November 1997 

Li, Y., Sun, M.: Modeling and analysis of component connectors in Coq. In: 
Fiadeiro, J.L., Liu, Z., Xue, J. (eds.) FACS 2013. LNCS, vol. 8348, pp. 273-290. 
Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07602-7_17 

Arbab, F.: Reo: a channel-based coordination model for component composition. 
Math. Struct. Comput. Sci. 14(03), 329-366 (2004) 

Marmsoler, D.: Towards a theory of architectural styles. In: Proceedings of the 
22nd ACM SIGSOFT International Symposium on Foundations of Software Engi- 
neering - FSE 2014, pp. 823-825. ACM Press (2014) 

Steinberg, D., Budinsky, F., Merks, E., Paternostro, M.: EMF: Eclipse Modeling 
Framework. Pearson Education, London (2008) 


168 D. Marmsoler 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the chapter’s 
Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


S 


Check for 
updates 


Supporting Verification-Driven 
Incremental Distributed Design 
of Components 


Claudio Menghi!“®)@, Paola Spoletini?®, Marsha Chechik?®, 
and Carlo Ghezzi*® 


1 Chalmers | University of Gothenburg, Gothenburg, Sweden 
claudio.menghi@gu.se 
2 Kennesaw State University, Marietta, USA 
pspoleti@kennesaw.edu 
3 University of Toronto, Toronto, Canada 
chechik@cs.toronto.edu 
* Politecnico di Milano, Milan, Italy 
carlo.ghezzi@polimi.it 


Abstract. Software systems are usually formed by multiple components 
which interact with one another. In large systems, components them- 
selves can be complex systems that need to be decomposed into multiple 
sub-components. Hence, system design must follow a systematic app- 
roach, based on a recursive decomposition strategy. This paper proposes 
a comprehensive verification-driven framework which provides support 
for designers during development. The framework supports hierarchi- 
cal decomposition of components into sub-components through formal 
specification in terms of pre- and post-conditions as well as independent 
development, reuse and verification of sub-components. 


1 Introduction 


Software is usually not a monolithic product: it is often comprised of multiple 
components that interact with each other to provide the desired functional- 
ity. Components themselves can be complex, requiring their own decomposition 
into sub-components. Hence, system design, must follow a systematic approach, 
based on a recursive decomposition strategy that yields a modular structure. 
A good decomposition and a careful specification should allow components and 
sub-components to be developed in isolation by different development teams, 
delegated to third parties [32], or reused off-the-shelf. 

In this context, guaranteeing correctness of the system under development 
becomes particularly challenging because of the intrinsic tension between two 
main requirements. On the one hand, to handle complexity, we need to enable 
development of sub-components where only a partial view of the system is avail- 
able [28]. On the other hand, we must ensure that independently developed and 
verified (sub-)components can be composed to guarantee global correctness of 
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The p&d running example. The p&d system supports furniture purchase and delivery. It uses 
two existing web services, which implement furniture-sale and delivery, as well as a component 
that implements the user interface. These are modeled by the labeled transition systems shown 
in Fig. la-lc. The p&d component under design is responsible for interaction with these com- 
ponents, which form its execution environment. The overall system must ensure satisfaction of 
the properties informally described in Fig. 1d. 


userReq 


prodReq shipReq “90 


Qe. 
userNack 


(a) Furniture-sale. (b) Shipping. (c) User. 


P1: ship and product info are provided only if a request has been received. 

P2: when user requests are processed, offers are considered only after users received information about the desired product. 
P3: the furniture service is activated only if the user has decided to purchase. 

P4: when a user request is cancelled by the p&d system, no user ack precedes the cancellation. 


(d) Properties of the p&d system. 


Fig. 1. The p&d running example. 


the resulting system. Thus, we believe that component development should be 
supported by a process that (1) is intrinsically iterative; (2) supports decentral- 
ized development; and (3) guarantees correctness at each development stage. 

The need for supporting incremental development of components has been 
widely recognized. Some approaches [15,37] synthesize a partial model of com- 
ponents from properties and scenarios and facilitate an iterative development of 
this model through refinement. Others [7,8,10,26,27] provide support for check- 
ing and refining partial models, with the goal of preserving correctness when 
such systems get refined. However, while these techniques guarantee correctness 
at each development stage, they do not address the problem of decentralized 
development. 

In this paper, we describe a unified framework called FIDDle (a Framework 
for Iterative and Distributed Design of components) which supports decentral- 
ized top-down development. FIDDle supports a formal specification of global 
properties, a decomposition process and specification of component interfaces 
by providing a set of tools to guarantee correctness of the different artifacts 
produced during the process. The main contribution of the paper is a method 
for supporting an iterative and distributed verification-driven component devel- 
opment process through a coherent set of tools. Specific novel contributions 
are (1) a new formalism, called Interface Partial Labelled Transition System 
(IPLTS), for specifying components through a decomposition that encapsulates 
sub-components into unspecified black-box states; (2) an approach to specify the 
expected behavior of black-box states via pre- and post-conditions expressed in 
Fluent Linear Time Temporal Logic; and (3) a notion of component correctness 
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Fig. 2. Overview of the application of FIDDle for developing a component. Thick- 
bordered components are implemented in FIDDle. Thick-dashed bordered components 
are currently supported by the theory presented in this paper, but they are still not 
fully implemented. Thin-dashed bordered components are not discussed in this work. 


and a local verification procedure that guarantees preservation of global properties 
once the components are composed. 

We illustrate FIDDle using a simple example: the purchase&delivery (p&d) 
example [14,29] — see Fig.1. We evaluate FIDDle on a realistic case study 
obtained by reverse-engineering the executive module of the Mars Rover devel- 
oped at NASA [12,17,18]. Scalability is evaluated by considering randomly- 
generated examples. 


Organization. Sect.2 provides an overview of FIDDle. Section3 gives the 
necessary background. Section 4 presents Interface Partial Labelled Transition 
Systems (IPLTS). Section 5 defines a set of algorithms for reasoning on par- 
tial components and describes their implementation. Section6 reports on an 
evaluation of the proposed approach. Section 7 compares FIDDle with related 
approaches, and Sect.8 concludes. Proofs for the theorems in the paper can 
be found in the Appendix available at http: //ksuweb.kennesaw.edu/~pspoleti/ 
fase-appendix.pdf; source code and video of the tool and a complete replication 
package can be found at https://github.com/claudiomenghi/FIDDLE. 


2 Overview 


FIDDle is a verification-driven environment supporting incremental and dis- 
tributed component development. A high-level view of FIDDle is shown in Fig. 2. 
FIDDIle allows incrementally developing a component through a set of develop- 
ment phases in which the human insight and experience are exploited (rounded 
boxes labeled with a designer icon or a recycle symbol, to indicate design or reuse, 
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respectively) and phases in which automated support is provided (squared boxes 
labeled with a pair of gearwheels). Automatic support allows verifying the cur- 
rent state of the design, synthesizing parts of the partial component, or checking 
whether the designed sub-component can correctly fit into the original design. 
FIDDle development phases are described below. 


Creating an Initial Component Design. This phase is identified in Fig. 2 
with the symbol a) The development team formalizes the properties that this 
component has to guarantee and designs an initial, high-level structure of the 
component. Designers also formulate properties that the component needs to 
ensure. The initial component design is created using a state-based formalism 
that can clearly identify parts (called “sub-components” in this paper), rep- 
resented as black-box states, whose internal design is delayed to a later stage 
or split apart for distributed development by other parties. In the following, we 
refer to other states as “regular”. Black-box states are enriched with an interface 
that provides information on the universe of events relevant to the black-box. 
They are also decorated with via pre- and post-conditions that allow distributed 
teams to develop sub-components without the need to know about the rest of 
the system. The contract of a black box state consists of its interface and pre- 
and post-conditions. 

In the p&d example, the environment (assumed as given) in which the 
p&d component will be deployed is composed by the furniture-sale component 
(Fig. la), the shipping component (Fig. 1b) and the user (Fig. 1c). A possible 
initial design for the p&d component is shown in Fig. 3c. It contains the regular 
states 1 and 3 and black-box states 2 and 4. The initial state is state 1. Whenever 
a userReq event is detected, the component moves from the initial state 1 into 
the black-box state 2, which represents a sub-component in charge of managing 
the user request. An event offerRcvd which indicates that an offer is provided 
to the user labels the transition to state 3. The pre- and post- conditions for 
black-box states 2 and 4 are shown in Fig. 3b. Events prodInfoReq, infoRcud, 
shipInfoReq and costAndTime can occur while the component is in the black- 
box state 2. The pre-condition requires that there is a user request that has not 
yet been handled, while the post-condition ensures that the furniture-sale and 
the shipping services provided info on the product and on delivery cost and time. 
FIDDle supports the developer in checking properties of the initial component 
design. 

The realizability checker confirms the existence of an integration that com- 
pletes the partially specified component and ensures the satisfaction of the prop- 
erties of interest. If such a component does not exist, the designer needs to 
redesign the partially-specified component. The well-formedness checker verifies 
that both the pre- and the post-conditions of black-box states are satisfiable. 
Finally, the model checker verifies whether the (partial) component (together 
with its contract) guarantees satisfaction of the properties of interest. 

In the p&d example, the model checker identifies a problem with the partial 
solution sketched in Fig. 3c. No matter how the black-box state 2 is to be defined, 
the p&d component cannot satisfy property P4 since every time reqgCanc occurs 
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=(7((7F_UserReq) U(F_ShipInfoReq V F_ProdInfoReq))) 

(F_UserReq — (7((>F_InfoRcvd) U F_OfferRcvd))) 

(F_UsrReq — ((3((>F_UserAck) W F_ShipReq)) 

((F_UsrReq ^ ((>F_UsrReq) U F_ReqCanc)) —((7>F_UserAck) U F_RegCanc) ) 


(a) FLTL formulation of the p&d properties. 


gea Be 
State 2 we eng 
respOk 


interface { prodInfoReg, infoRcvd, shipInfoReq, costAndTime } usrAck 
pre }(F_UserReg \ 3 (F_RespOk V F_ReqCanc)) . Oo 
post (O F_InfoRcvd) \(< F_CostAndTime) reqCanc 
State 4 (c) Partial p&d. 
interface { prodReq, shipReg } userReq 
pre (F_UserReq — Q F_InfoRcvd) 
post ((F_ProdReq) \ }(F_ShipReq)) ‘espOk usrAck 
State 5 
interface { prodCancel, shipCancel } D Kx offerRcvd 
pre (F_UserReq —  F_InfoRcvd) “ro, wt 
post (<(F_ProdCancel) ^ <(F_ShipCancel) ) (d Another patil ped component 


(b) Contracts for black-box states of Figs. 3c-3g. 


shipInfoReq costAndTime prodInfoReq — infoRcvd : shipInfoReq costAndTime aac 


(e) A sub-component for black-box state 2. (f) Another sub-component for black-box 
state 2. 


“90, 


ck 
we KA 


userReq shipInfoReq costAndTime prodInfoReq infoRcvd 


offerRcvd 


usrAck 


(g) Integration of the sub-component of Fig. 3e and the component of Fig. 3d. 


Fig. 3. The p&d running example: artifacts produced by FIDDle. 


it is preceded by usrAck. This suggests a re-design of the p&d component, which 
may lead to a new model, shown in Fig.3d. This model includes two regular 
states: state 1, in which the component waits for a new user request, and state 
3, in which the component has provided the user with an offer and is waiting 
for an answer. The user might accept (userAck) or reject (userNack) an offer 
and, depending on this choice, either state 4 or 5 is entered. States 2, 4 and 
5 are black-box states, to be refined later. The designer also provides pre- and 
post-conditions for the black-box states. Pre- and post-conditions of the black- 
box state 2 specify that there is a pending user request, and that cost, time and 
product information are collected. Pre- and post-conditions of the black-box 
state 4 specify that infoRcvd has occurred after the user request, and both a 
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product and shipping requests are performed. Finally, pre- and post-conditions 
of the black-box state 5 specify that infoRcud has occurred after the user request 
and before entering the state, and both the product and the shipping requests 
are cancelled when leaving the state. This model is checked using the provided 
tools; since it passes all the checks, it can be used in the next phase of the 
development. 

The design team may choose to refine the component or distribute the devel- 
opment of unspecified sub-components (represented by black box states) to other 
(internal or external) development teams. In both cases, the sub-component can 
be designed by only considering the contract of the corresponding black-box 
state. Each team can develop the assigned sub-component or reuse existing com- 
ponents. 


Sub-component Development. This phase is identified in Fig.2 with the 
symbol (2). Each team can design the assigned sub-component using any avail- 
able technique, including manual design (left side), reusing of existing sub- 
components (right side) or synthesizing new ones from the provided specifi- 
cations (center). The only constraints are (1) given the stated pre-condition, 
the sub-component has to satisfy its post-condition, and (2) the sub-component 
should operate in the same environment as the overall partially specified compo- 
nent. Sub-component development can itself be an iterative process, but neither 
the model of the environment nor the overall properties of the system can be 
changed during this process. Otherwise, the resulting sub-component cannot be 
automatically integrated into the overall system. 

In the p&d example, development of the sub-component for the black-box 
state 2 is delegated to an external contractor. Candidate sub-components are 
shown in Fig.3e-f. In the former case, the component requests shipping info 
details and waits until the shipping service provides the shipment cost and time. 
Then it queries the furniture-sale service to obtain the product info. In the latter 
case, the shipping and the furniture services are queried, but the sub-component 
does not wait for an answer from the furniture-sale. Since these candidates are 
fully defined, the well-formedness check is not needed. Yet, the substitutability 
checking confirms that of these, only the sub-component in Fig. 3e satisfies the 
post-condition in Fig. 3b. 


Sub-component Integration. This phase is identified in Fig. 2 with the sym- 
bol 8). FIDDle guarantees that if each sub-component is developed correctly 
w.r.t. the contract of the corresponding black-box state, the component obtained 
by integrating the sub-components is also correct. In the p&d example, the sub- 
component in Fig. 3e passes the substitutability check and can be a valid imple- 
mentation of the black-box state 2 in Fig.3d. Their integration is showed in 
Fig. 3g. 


3 Preliminaries 


The model of the environment and the properties of interest are expressed using 
Labelled Transition Systems and Fluent Linear Time Temporal Logic. 
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Model of the Environment. Let Act be the universal set of observable events 
and let Act; = ActU{r}, where 7 denotes an unobservable local event. A Labeled 
Transition System (LTS) [20] is a tuple A = (Q, qo, QA, A), where Q is the set 
of states, gg E€ Q is the initial state, aA C Act is a finite set of events, and 
A CQ x aAU {r} x Q is the transition relation. The parallel composition 
operation is defined as usual (see for example [14]). 


Properties. A fluent [33] Flis a tuple (Ir), Tr, Initri), where Ip, C Act, Tr, C 
Act, Iri O Tr, = 0 and Initrı € {true, false}. A fluent may be true or false. A 
fluent is true if it has been initialized by an event i € Ip; at an earlier time point 
(or if it was initially true, that is, Inttr; = true) and has not yet been terminated 
by another event t € Tr; otherwise, it is false. For example, consider the LTS 
in Fig. 1c and the fluent F_RegPend=({userReq}, {respOk, reqCanc}, false ). 
F_ReqPend holds in a trace of the LTS from the moment at which userReq 
occurs and until a transition labeled with respOk or reqCanc is fired. In the 
following, we use the notation F_Event to indicate a fluent that is true when the 
event with label event occurs. 

An FLTL formula is obtained by composing fluents with standard LTL 
operators: O (next), © (eventually), O (always), U (until) and W (weak until). 
For example, FLTL encodings of the properties P1, P2, P3 and P4 are shown 
in Fig. 3a. 

Satisfaction of FLTL formulae can be evaluated over finite and infinite traces, 
by first constructing and FLTL interpretation of the infinite and finite trace 
and then by evaluating the FLTL formulae over this interpretation The FLTL 
interpretation of a finite trace is obtained by slightly changing the interpretation 
of infinite traces. The evaluation of the FLTL formulae on the finite trace is 
obtained by considering the standard interpretation of LTL operator over finite 
traces (see [13]). In the following, we assume that Definitions 5 and 4 (available in 
the Appendix) are considered to evaluate whether an FLTL formula is satisfied 
on finite and infinite traces, respectively. 


4 Modeling and Refining Components 


This section introduces a novel formalism for modeling and refining components. 
We define the notion of a partial LTS and then extend it with pre- and post- 
conditions. 


Partial LTS. A partial LTS is an LTS where some states are “regular” and 
others are “black-box”. Black-box states model portions of the component whose 
behavior still has to be specified. Each black-box state is augmented with an 
interface that specifies the universe of events that can occur in the black-box. A 
Partial LTS (PLTS) is a structure P = (A, R, B,o), where: A = (Q, qo, vA, A) 
is an LTS; Q is the set of states, s.t. Q = RUB and RN B=); R is the set 
of regular states; B is the set of black-box states; o : B — 2° is the interface. 
An LTS is a PLTS where the set of black-box states is empty. The PLTS in 
Fig. 3d is defined over the regular states 1 and 3, and the black-box states 2, 
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4 and 5. The interface specifies that events prodInfoReq, infoRcvd, shipInfoReq 
and costAndTime can occur in the black-box state 2. 


Definition 1. Given a PLTS P = (A,R,B,c) defined over the LTS A = 
(Q^, qf, aA, A^) and an LTS D = (Q?,q?,aD, AP), the parallel composition 
P || D is an LTS S = (07 07 5,47) such that QS = Q4 x QP; q8 = (qf, qP); 
aS =aAUaD; and the set of transitions AÙ is defined as follows: 


ee andl €aA\aD or l=r7; 


(t,1,t’)E AP d he following is satisfied: (1) l D\ aA, (2 
~ Us,t),l, (st yeas? GG one of the following is satisfied: (1) | € aD \ aA, (2) 


aby 


l=T, or (3) (s € Bandleo(s)); 


(obs eA GOES” and Le oAN aD, I #7. 


Given P, A, D defined above, the system S = P || D and a state q 
of P, we say that a finite trace Ip,l,,...l, of S reaches q if there exists a 
sequence (so, to), lo, (s1, t1),---ln; (q, tn+1), where for every 0 < i < n, we have 
(lsi, ti), li, (Si+1, ti+1)) € AS. For example, considering the PLTS in Fig. 3d and 
the LTS in Fig. 1c, the finite trace obtained by performing a userReq event 
reaches the black-box state 2 of the PLTS. 

Given a finite trace m = I9,1,,...l, (or an infinite trace lo, l1,...) of S, we 
say that its sub-trace li, li+1 .. -lẹ is inside the black-box state b if one of the 
sub-sequences associated with 7 is in the form (b, ti), li, (b, ti¢1),--- lz, (b, tk), 
where li, li+1,-- -lẹ E€ a(b). Note that a sub-trace is a finite trace. For example, 
considering the parallel composition of the PLTS in Fig.3d and the LTSs in 
Fig. lc and b, and the finite trace associated with events userRegq, shipInfoRegq, 
offerRcvd, the sub-trace associated with shipInfoRegq is inside the black-box state 
2. This means that shipInfoReq must occur in the sub-component replacing the 
black-box state 2. 


Adding Pre- and Post-conditions. The intended behavior of a sub- 
component refining a black-box state can be captured using pre- and post- 
conditions. The contract for the sub-component associated with a box con- 
sists of the box interface and its pre- and post-conditions. Given the univer- 
sal set FLTL of the FLTL formulae, an Interface PLTS (IPLTS) I is a struc- 
ture (A, R, B,o, pre, post), where (A, R, B,c) is a PLTS, pre: B — FLTL and 
post: B — FLTL. 

For each black-box state b, the function pre specifies a constraint that must be 
satisfied by all finite traces of P that reach b. For example, the FLTL-expressed 
pre-condition for the black-box state 4 of the IPLTS in Fig. 3d requires that 
any trace of the composition between the IPLTS and an LTS that reaches the 
black-box state 4 provides info on the product to the user after his/her request. 

For each black-box state b, the function post specifies a post-condition that 
constrains the behavior of the system in any sub-trace performed inside b. For 
example, the post-condition of the black-box state 4 of the IPLTS in Fig. 3d 
ensures that whenever this IPLT'S is composed with an LTS, a product request 
and a shipping request are performed by the furniture-sale service while the 
system is inside the black-box state. 
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Given an IPLTS J and an LTS D, the parallel composition S between I 
and D is obtained by considering the PLTS P associated with I and the LTS 
D as specified in Definition 1. Given an IPLTS J, an LTS D and the parallel 
composition S between I and D, trace m of S is valid iff it is infinite and for 
every black-box state b, the post-condition post(b) holds in any sub-trace of m 
performed inside b. 


Definition 2. Given an LTS D, an IPLTS I is well-formed (over D) iff every 
valid trace of S =I || D satisfies all the pre-conditions of black-box states of I. 


We say that S = I || D satisfies an FLTL property ¢ if and only if ¢ is satisfied 
by every valid trace of S. In the p&d example, the post-condition © (F_ProdReq) 
^ © (F_ShipReq) of the black-box 4 ensures that the parallel composition of the 
component in Fig. 3d and its environment satisfies P3. 


Sub-components and Their Integration. Integration aims to replace black- 
box states of a given IPLTS with the corresponding sub-components. Given an 
IPLTS J, one of its black-box states b and its interface a(b), a sub-component for 
bis an IPLTS R defined over the set of events o(b). One state q? of R is defined as 
the final state of R. Given a sub-component R, an LTS of its environment FE, and 
a trace in the form Ti; Te such that m; = lo, li ...ln and Te = ln+1,ln+2;.-- lk, 
we say that Ti; Te is a trace of the parallel composition between R and E if 
and only if (1) there exists a sequence qo, lo, q1, l1 - --ln,qn in the environment 
such that for all i, where 0 < i < n, (qi, li, qi+ı1) is a transition of E; (2) 7 
is obtained by R || E considering qn as the initial state for the environment, 
(3) Te reaches qF. A sub-component is valid if it ensures that the traces of the 
parallel composition satisfy its post-conditions. Intuitively, a trace of the parallel 
composition between a sub-component R and the environment FE is obtained by 
concatenating two sub-traces: m; and Te. The sub-trace 7; corresponds to a set of 
transitions performed by the environment before the sub-component is activated, 
while re is a trace the system generates while it is in the sub-component R. 


Definition 3. Given an IPLTS I with a black-box state b, the environment E 
and a sub-component R for b, R is a substitutable sub-component iff every trace 
Ti; Te of the parallel composition between R and E is such that if m; satisfies 
pre(b) then Te guarantees post(b). 


Intuitively, whenever the sub-component is entered and the pre-condition pre(b) 
is satisfied (i.e., the trace 7; satisfies pre(b)), then a trace of the parallel com- 
position between the sub-component and the environment that reaches the final 
state of the sub-component must satisfy the post-condition post(b). 

A black-box state of an IPLTS C can be replaced by a substitutable sub- 
component R though an integration procedure. The resulting IPLTS C” is called 
integration. Intuitively, the integration procedure connects every incoming and 
outgoing transition of the considered black-box state to the initial and final 
state of the substitutable sub-component R, respectively. Integrating the sub- 
component R for black-box state 2 in Fig.3e into the component in Fig. 3d 
produces the IPLTS in Fig. 3g. The prefix “2.” is used to identify the states 
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obtained from R. The contracts of black-box states 4 and 5 are the same as 
those in Fig. 3b. 


Theorem 1. Given a well-formed IPLTS C and a substitutable sub-component 
R for a black-box state b of C, if C satisfies an FLTL property ¢, then the 
integration C" obtained by substituting b with R also satisfies o. 


The sub-component R from Fig. 3e is substitutable; thus, integrating it into 
the partial component C shown in Fig. 3g ensures that the resulting integrated 
component C” preserves properties P1-P4. 


5 Verification Algorithms 


In this section, we describe the algorithms for the analysis of partial components, 
which we have implemented on top of LTSA [25]. 


Checking Realizability. Realizability of a property @ is checked via the fol- 
lowing procedure. Let E be the environment of the partial component C, and 
OP be the LTS resulting from removing all black-box states and their incoming 
and outgoing transitions from C. Check C? || E | 4. If ¢ is not satisfied, 
the component is not realizable: no matter how the black-box states are speci- 
fied, there will be a behavior of the system that does not satisfy ¢. Otherwise, 
compute C || Æ (as specified in Definition 1) and model-check it against 7¢. 
If the property 7¢@ is satisfied, the component is not realizable. Indeed, all the 
behaviors of C || E satisfy 74, i.e., there is no behavior that the component can 
exhibit to satisfy ¢. Otherwise, the component may be realizable. For example, 
the realizability checker shows that it is possible to realize a component refining 
the one shown in Fig. 3c while satisfying property P2. Specifically, it returns a 
trace that ensures that after a userReq event, the offer is provided to the user 
(the event offerRcvd) only if the furniture service has confirmed the availability 
of the requested product (the event inforRcvd). 


Theorem 2. Given a component specified using an IPLTS C, its environment 
E, and a property of interest @, the realizability checker returns “not realizable” 
if there is no component C’ obtained from C by integrating sub-components, s.t. 


(œ || E) F ¢. 


Checking Well-Formedness. Given a partial component C' with a black-box 
state b annotated with a pre-condition pre(b) and its environment E, the well- 
formedness checks whether pre(b) is satisfied in C as follows. 


(1) Transform post-conditions into LTSs. Transform every FLTL post-condition 
post(b;) of every black-box state b; of C, including b, into an FLTL for- 
mula post(b;)’ as specified in [13]. This transformation ensures that the 
infinite traces that satisfy post(b;)’ have the form 7, {end}”, where 7 satis- 
fies post(b;). For each black-box state b;, the corresponding post-condition 


(4) 


Supporting Verification-Driven Incremental Distributed Design 179 


post(b;)’ is transformed into an equivalent LTS, called LT S%,, using the pro- 
cedure in [37]. Since LTS», has traces in the form 7, {end}”, it has a state 
s with an end-labelled self-loop. This self-loop is removed, and s is consid- 
ered as final state of LTS,,. All other end-labeled transitions are replaced 
by r-transitions. Each automaton LT Sp, contains all the traces that do not 
violate the corresponding post-condition. 

Integrate the LTSs of all the black-box states b; # b. For every black-box 
state b; Æ b, eliminate b; and add LT'S,, to C by replacing every incoming 
transition of b; with a transition whose destination is the initial state of 
LTS»,, and every outgoing transition of b; with a transition whose source 
is the final state of LT'S,,. This step creates an LTS which encodes all the 
traces of the component that do not violate any post-conditions of its black- 
box states. 

Integrate the LTS of the black-box state b. Integrate LT Sp into C together 
with two additional states, qı and qo, calling the resulting model C”. Replace 
every incoming transition of b by a transition with destination q1. Replace 
every outgoing transition of b by a transition whose source is the final state 
of LTS,. Add a transition labeled with r from qı to the initial state of LTS). 
Add a self-loop labeled with an event end to q2. Add a 7-transition from qı 
to q2. The obtained LTS C” encodes all the valid traces of the system. When 
a valid trace reaches the black-box state b, C” can enter state q2 from which 
only the end-labelled self-loop is available. 

Verify. Recall that the precondition pre(b) of b is defined over finite traces, 
i.e., those that reach the initial state of the sub-component to be substituted 
for b. To use standard verification procedures, we transform pre(b) into an 
equivalent formula, pre(b)’, over infinite traces. This transformation, speci- 
fied in [13], ensures that every trace of the form 7, {end}” satisfies pre(b)’ 
iff m satisfies pre(b). By construction in step 3 above, C” || Æ has a valid 
trace of this form which is generated when C || E reaches the initial state 
of the LTS LT S, associated with the black-box state b of C. To check the 
pre-condition, we verify whether C” || E = pre(b)’ using traditional model 
checking. 


In the p&d example, if we remove the clause © F_InfoRcvd from the post- 


condition of the black-box state 2, the p&d component is not well-formed since 
the pre-condition of state 4 is violated. The counterexample shows a trace that 
reaches the black-box state 4 in which an event userReq is not followed by infoR- 
cud. Adding © F_InfoRcvd to the post-condition of state 2 solves the problem. 


Theorem 3. Given a partial component C with a black-box state b annotated 
with a pre-condition pre(b) and its environment E, the well-formedness procedure 
returns true iff the valid traces of C satisfy the pre-condition pre(b). 


Model Checking. To check whether C || E satisfies ¢, we first construct an LTS 
C’ that generates only valid traces, by plugging into C the LTSs corresponding to 
all of its black-box states (as done in steps 1 and 2 of the well-formedness check) 


and use a classical FLTL model-checker to verify C” || Æ 


H| @. If we consider the 
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design of Fig. 3d and assume that the black-box state 2 is not associated with 
any post-condition, the model checker returns the counterexample userReq,T, 
offerRcvd for property P2, since the sub-component that will replace the black- 
box state 2 is not forced to ask to book the furniture service. Adding the post- 
condition in Fig. 3b solves the problem. 


Theorem 4. The model checking procedure returns true iff every valid trace of 
C || E satisfies ¢. 


Checking Substitutability. Given the environment E, a component C with 
a black-box state b and pre- and post-conditions pre(b) and post(b), and a sub- 
component R, this procedure checks whether R can be used in C in place of b. 
We first present a procedure assuming that R has no black-box states. 


(1) Transform the pre-condition pre(b) into an LTS, called LTS, using Step (1) 
of the well-formedness procedure. 

(2) Compute the sequential composition (LT S,.R) between the LTS, and R. 
This is done by connecting the final state qı of LT Sẹ with the initial state 
of R by a transition labelled with a fresh event init. Then, the final state 
of R is connected to an additional state q2 through a 7-labeled transition. 
A self-loop labeled with a fresh event end is added to q2. Performing these 
steps ensures that the prefix m of every infinite trace in the form 7, {end}” 
is comprised of two parts: 7 = T1; T2, where 7 satisfies pre(b) and 7 is 
generated by the LTS R. 

(3) Verify the result. The formula à = init + O(post(b)) must hold on any trace 
that reaches the final state of R, e.g., on any trace of the form 7; {end}*, 
where X’ is the result of applying the finite- to infinite-trace FLTL transfor- 
mation [13] to A. This transformation ensures that 7 satisfies À iff a trace of 
the form 7; {end}” satisfies à’. And that, in turn, can be verified by checking 
((LTS,.R) || E) H= A using a classical model-checker. 


If R contains black-box states, checking R requires performing Steps (1) and 
(2) of the well-formedness check before running the substitutability procedure. 

In the p&d example, the substitutability checker does not return any coun- 
terexample for the sub-component in Fig. 3e. Thus, the post-condition is satisfied 
and the sub-component can be integrated in place of the black-box state 2. 


Theorem 5. Let a component C with a black-box state b, its pre- and post- 
conditions pre(b) and post(b), a sub-component R, and C'’s environment E be 
given. The substitutability checker returns true, indicating that R can be used in 
C in place of b, iff for every trace T = Ti; Tte of R || E, if mi is the finite prefix 
of E satisfying pre(b) and Te is obtained by R || E considering the final state of 
mi as the initial state of the environment, then Te satisfies post(b). 


6 Evaluation 


We aim to answer two questions: RQ.1: How effective is FIDDle w.r.t. support- 
ing an iterative, distributed development of correct components? (Sect. 6.1) and 
RQ.2: How scalable is the automated part of the proposed approach? (Sect. 6.2). 
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6.1 Assessing Effectiveness 


We simulated development of a complex component and analyzed FIDDIle- 
provided support along the steps described in Sect. 2. 


Experimental Setup. We chose the executive module of the K9 Mars Rover 
developed at NASA Ames [12, 17,18] and specified using LTSs. The overall size of 
the LTS is ~10" states. The executive module was made by several components: 
Executive, ExecCondChecker, ActionExecution and Database. ExecCondChecker 
was further decomposed into db-monitor and internal. Each of these components 
was associated with a shared variable (exec, conditionList, action and db, respec- 
tively) used to communicate with the other components, e.g., the exec variable 
was used by ExecCondChecker to communicate with Executive. The access of 
each shared variable was regulated through a condition variable and a lock. The 
complete model of the Executive component comprised of 11 states, each fur- 
ther decomposed as an LTS. The final model of the Executive component was 
obtained by replacing these states with the corresponding LTSs. This model had 
about 100 states which is a realistic component of a medium size [5,6, 24]. 

We considered two properties: (P1): Executive performed an action only 
after a new plan was read from Database; (P2): Executive got the lock over the 
condList variable only after obtaining the exec lock. 


Creating an Initial Component Design. We considered the existing model (D3) 
of the Executive and abstracted portions of the complete model into black-box 
states to create two partial components D1 and D2 representing partial designs. 
To generate D2 we encapsulated three states that receive plans and prepare for 
plan execution into the black-box state Read_Plans. To generate D1, we also set 
one of the 10 states of the Executive whose corresponding LTS is in charge of 
executing a plan, i.e., state Execute TaskAction, as a black-box state. By following 
this procedure, D3 and D2 can be obtained from D2 and D1, respectively, by 
integrating the abstracted sub-components. 

We considered the (partial) components D1, D2 and D3 and used FIDDle to 
iteratively develop and check their contracts. For D1, the steps were as follows: 
(1) The realizability checker confirmed the existence of a model that refined 
D1 and satisfied the properties of interest. (2) The model checker returned 
a counterexample for both properties of interest. For P1, the model checker 
returned a counterexample in which no plan was read and yet an action was 
performed. For P 2, the counterexample was where Executive got the condList 
lock without possessing the exec lock. To guarantee the satisfaction of P 1, we 
specified a post-condition to the black-box state Read_Plans that ensures that 
a plan was read. We also added a pre-condition requiring that an action was 
not under execution when the black-box state Read_Plans was entered. (3) The 
well-formedness checker returned a counterexample trace that reached the black- 
box state Read_Plans while an action was under execution. (4) To ensure well- 
formedness, we added a postcondition to the black-box state Execute TaskAction 
ensuring that an action was not under execution when the system exited the 
black-box state. (5) The model checker confirmed that P1 held. (6) To guar- 
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antee the satisfaction of P2, we added a post-condition to the black-box state 
Read_Plans ensuring that when the control left the black-box, P2 remained true 
and the Executive had the ezec lock. 

For design D2, the steps were as follows: (1) The realizability checker con- 
firmed the existence of a model that refined D2 and satisfied the properties 
of interest. (2) We ran the model checker that returned a counterexample for 
both properties of interest. (3) We added to the black-box state Read_Plans 
the same pre- and post-conditions of as those developed for design Di and ran 
the well-formedness and the model checker. (4) The well-formedness checker 
confirmed that D2 satisfied the pre-condition of the black-box Read_Plans; the 
model checker certified the satisfaction of P1 and P2. 

Since the model of Executive was complete, we ran only the model checker 
to check D3. Properties P1 and P2 were satisfied. 


Sub-component Development. We simulated a refinement process in which pre- 
and post-conditions were given to third parties for sub-component development. 
We considered the sub-components SUB1 and SUB2 containing the portion of 
the Executive component abstracted by the black-box states Execute TaskAction 
and Read_Plans, respectively. We run the substitutability checker to verify, affir- 
matively, whether SUB1 and SUB2 ensured the post-condition of the black-box 
states ExecuteTaskAction and Read_Plans given their pre-conditions. 


Sub-component Integration. We then plugged in the designed sub-components 
into their corresponding black-box states. We integrated each sub-component 
into design D1 and used the model checker to verify the resulting (partial) 
components w.r.t. properties P1-P2. The properties were satisfied, as intended. 


Results. FIDDle was effective in analyzing partial components and helping 
change their design to ensure the satisfaction of the properties of interest. 
The experiment confirmed the possibility of distributing the design of sub- 
components for the black-box states. As expected, no rework at the integra- 
tion level was required, i.e., integration produced components that satisfied the 
properties of interest. This confirmed that FIDDle supports verification-driven 
iterative and distributed development of components. 


Threats to Validity. A threat to construct validity concerns the (manual) con- 
struction of intermediate model produced during development by abstracting an 
existing component model and the design of the properties to be considered. 
However, the intermediate partial designs and the selected properties were based 
on original developer comments present in the model. A threat to internal valid- 
ity concerns the design of the contracts (pre- and post- conditions and interfaces) 
for the black-box states chosen along the process. However, pre- and post- con- 
ditions were chosen and designed by consulting property specification patterns 
proposed in literature [16]. The fact that a single example has been considered is 
a threat to external validity. However, the considered example is a medium-size 
complex real case study [6, 22,35]. 
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Table 1. Results of experiments E1 and E2. 


#CompStates 
E1 : (Tw)/(Tm) E2 : (Ts)/(Tm) 

#EnvStates|10 |50 |100 |250 |500 |750 |1000 |10 50 |100 |250 |500 |750 |1000 
10 1.45 | 1.26 | 1.51 | 1.29 | 1.42 | 1.43 | 1.31 | 2.20 | 4.37 | 2.18 | 1.50 | 2.19 | 1.62 | 1.62 
100 1.15 | 1.25 | 1.50 | 1.08 | 0.88 1.02 | 2.33 3.51 | 4.66 3.61 | 2.80 | 3.18 | 1.96 | 2.73 

1000 1.39 | 1.23 | 0.60 | 1.44 | 4.90 | 1.00 | 2.83 | 13.98 | 8.12 | 3.84 | 2.64 | 2.83 | 2.91 | 2.00 


6.2 Assessing Scalability 


We set up two experiments (E1 and E2) comparing performance of the well- 
formedness and the substitutability checkers w.r.t. classical model checking as 
the size of the partial components under development and their environments 
grew. Our experiments were based on a set of randomly-generated models. 


E1. To evaluate the well-formedness checker, we generated an LTS model of the 
environment and a complete model for the component. We checked the parallel 
composition between the component and the environment w.r.t. a property of 
interest using a standard model checker. Then, we generated a partial component 
by marking one of the states of the complete component as a black-box, defining 
pre- and post- conditions for it and ran the well-formedness checker, comparing 
performance of the two. 


E2. To evaluate the substitutability checker, we generated a complete component 
as in the previous experiment. Then, we extracted a sub-component by selecting 
half of the component states and the transitions between them. States qo and 
qf were added to the sub-component as the initial and final state, respectively. 
State go (qf) was connected with all the states of the sub-component that had, in 
the original component, at least one incoming (resp., outgoing) transition from 
(resp., to) a state that was not added to the sub-component. We defined the pre- 
and post-conditions for the sub-component and ran the substitutability checker 
comparing its performance with model-checking. 


Experimental Setup. We implemented a random model generator to create 
LTSs with a specified number of states, transition density (transitions per state) 
and number of events. We generated environments with an increasing number 
of states: 10, 100 and 1000. We have chosen 10 as a fixed value for the transition 
density and 50 as the cardinality of the set of events. We considered components 
with 10, 50, 100, 250, 500, 750 and 1000 states. The components were generated 
using the same transition density and number of events as in the produced 
environment. To produce the partial component, we considered one of the states 
of the component obtained previously as a black-box, and randomly selected 
25% of the events of the component as the interface of the partial component. 
To produce the sub-component, we randomly extracted half of the component 
states and the transitions between them. 
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Properties of Interest, Pre- and Post-conditions. We considered properties 
K1 = OQ > P), K2 = OQ>(-PUQ), K3 = OQ > O(-P)), which corre- 
spond to commonly used property patterns [16], and where Q and P are 
appropriately defined fluents. We considered K1, K2 and K3 as pre- and post- 
conditions for the black-box. 


Methodology and Results. We ran each experiment 5 times on a 2 GHz Intel 
Core i7, with 8 GB 1600 MHz DDR3 disk. For each combination of values of 
the #EnvStates and #ContStates we computed the average between the time 
required by the well-formedness checker (Tw) and by the model checker (Tm), 
for the experiment Æ1, and the average between the time required by the sub- 
stitutability checker (Ts) and by the model checker (Tm), for the experiment £2 
(see Table 1). The results show that the well-formedness and the substitutability 
checker scale as the classical model checker. 


Threats to Validity. The procedure employed to randomly generate models is 
a threat to construct validity. However, the transition density of the components 
was chosen based on the Mars Rover example. Furthermore, the number of states 
of the sub-component was chosen such that the ratio between the sizes of the 
component and the sub-component was approximately the same of the Mars 
Rover. The properties considered in the experiment are a threat to internal 
validity. However, they were chosen by consulting property specification patterns 
proposed in literature [16]. Considering a single black-box state is a threat to 
external validity. However, our goal was to evaluate how FIDDle scales with 
respect to the component and the environment sizes and not w.r.t. the number 
of black-box states and the size of the post-conditions. Considering multiple 
black-box states reduces to the case of considering a single black-box with a 
more complex post-condition. 


7 Related Work 


We discuss approaches for developing incrementally correct components. 


Modeling Partiality. Modal Transition Systems [21], Partial Kripke Structures 
[8], and LTS? [17] support the specification of incomplete concurrent systems 
and can be used in an iterative development process. Other formalisms, such as 
Hierarchical State Machines (HSMs) [4], are used to model sequential processes 
via a top-down development process but can only be analyzed when a fully- 
specified model is available. 


Checking Partial Models. Approaches to analyze partial models (e.g., [8,10]) 
are not applicable to the problem considered in this paper where missing sub- 
components are specified using contracts and their development is distributed 
across different development teams. The assumption generation problem for 
LTSs [17] is complementary to the one considered in this paper and concerns the 
computation of an assumption that describes how the system model interacts 
with the environment. 
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Substitutability Checking. The goal of substitutability checking is to verify 
whether a possibly partial sub-component can be plugged into a higher level 
structure without affecting its correctness. Problems such as “compositional 
reasoning” [1,19,30], “component substitutability” [9], and “hierarchical model 
checking” [4] are related to this part of our work. Our work differs because we 
first guarantee that the properties of interest are satisfied in the initially-defined 
partial component and then check that the provided sub-components can be 
plugged into the initial component. 


Synthesis. Program synthesis [14,31] aims at computing a model of the system 
that satisfies the properties of interest. Moreover, synthesis can be used to gen- 
erate assumptions on a system’s environment to make its specification relizable 
(e.g., [23]). Sketch [36] supports programmers in describing an initial structure 
of the program that can be completed using synthesis techniques, but does not 
explicitly consider models. Many techniques for synthesizing components have 
been proposed, e.g., [14,37], and a fully automated synthesis of highly non- 
trivial components of over 2000 states big is becoming possible [11] for special 
cases, by limiting the types of synthesizable goals and using heuristics. However, 
such cases might not be applicable in general. Recent work has been done in 
the direction of compositional [2,3] and distributed [34] synthesis. We do not 
consider our approach to be an alternative to synthesis, but instead a way to 
combine synthesis techniques with the human design. 


8 Conclusion 


We presented a verification-driven methodology, called FIDDle, to support itera- 
tive distributed development of components. It enables recursively decomposing 
a component into a set of sub-components so that the correctness of the overall 
component is ensured. Development of sub-components that satisfy their speci- 
fications can then be done independently, via distributed development. We have 
evaluated FIDDle on a realistic Mars Rover case study. Scalability was evaluated 
using randomly generated examples. 
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Abstract. As developers often use third-party libraries to facilitate 
software development, the lack of proper API documentation for 
these libraries undermines their reuse potential. And although several 
approaches extract usage examples for libraries, they are usually tied 
to specific language implementations, while their produced examples are 
often redundant and are not presented as concise and readable snip- 
pets. In this work, we propose a novel approach that extracts API 
call sequences from client source code and clusters them to produce a 
diverse set of source code snippets that effectively covers the target API. 
We further construct a summarization algorithm to present concise and 
readable snippets to the users. Upon evaluating our system on software 
libraries, we indicate that it achieves high coverage in API methods, while 
the produced snippets are of high quality and closely match handwritten 
examples. 


Keywords: API usage mining - Documentation - Source code reuse 
Code summarization - Mining software repositories 


1 Introduction 


Third-party libraries and frameworks are an integral part of current software 
systems. Access to the functionality of a library is typically offered by its API, 
which may consist of numerous classes and methods. However, as noted by mul- 
tiple studies [24,30], APIs often lack proper examples and documentation and, 
in general, sufficient explanation on how to be used. Thus, developers often 
use general-purpose or specialized code search engines (CSEs), and Question- 
Answering (QA) communities, such as Stack Overflow, in order to find possible 
API usages. However, the search process in these services can be time consuming 
© The Author(s) 2018 
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[13], while the source code snippets provided in web sites and QA communities 
might be difficult to recognise, ambiguous, or incomplete [28, 29]. 

As a result, several researchers have studied the problem of API usage min- 
ing, which can be described as automatically identifying a set of patterns that 
characterize how an API is typically used from a corpus of client code [11]. There 
are two main types of API mining methods. First are methods that return API 
call sequences, using techniques such as frequent sequence mining [31-33], clus- 
tering [25,31,33], and probabilistic modeling [9]. Though interesting, API call 
sequences do not always describe important information like method arguments 
and control flow, and their output cannot be directly included in one’s code. 

A second class of approaches automatically produces source code snippets 
which, compared to API call sequences, provide more information to the devel- 
oper, and are more similar to human-written examples. Methods for mining 
snippets, however, tend to rely on detailed semantic analysis, including program 
slicing [5, 13-15] and symbolic execution [5], which can make them more difficult 
to deploy to new languages. Furthermore, certain approaches do not use any 
clustering techniques, thus resulting to a redundant and non-diverse set of API 
soure code snippets [20], which is not representative as it only uses a few API 
methods as noted by Fowkes and Sutton [9]. On the other hand, approaches 
that do use clustering techniques are usually limited to their choice of clustering 
algorithms [34] and/or use feature sets that are language-specific [13-15]. 

In this paper, we propose CLAMS (Clustering for API Mining of Snip- 
pets), an approach for mining API usage examples that lies between snippet 
and sequence mining methods, which ensures lower complexity and thus could 
apply more readily to other languages. The basic idea is to cluster a large set 
of usage examples based on their API calls, generate summarized versions for 
the top snippets of each cluster, and then select the most representative snippet 
from each cluster, using a tree edit distance metric on the ASTs. This results in a 
diverse set of examples in the form of concise and readable source code snippets. 
Our method is entirely data-driven, requiring only syntactic information from 
the source code, and so could be easily applied to other programming languages. 
We evaluate CLAMS on a set of popular libraries, where we illustrate how its 
results are more diverse in terms of API methods than those of other approaches, 
and assess to what extent the snippets match human-written examples. 


2 Related Work 


Several studies have pointed out the importance of API documentation in the 
form of examples when investigating API usability [18,22] and API adoption in 
cases of highly evolving APIs [16]. Different approaches have thus been presented 
to find or create such examples; from systems that search for examples on web 
pages [28], to ones that mine such examples from client code located in source 
code repositories [5], or even from video tutorials [23]. Mining examples from 
client source code has been a typical approach for Source Code-Based Recom- 
mendation Systems (SCoReS) [19]. Such methods are distinguished according 
to their output which can be either source code snippets or API call sequences. 
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2.1 Systems that Output API Call Sequences 


One of the first systems to mine API usage patterns is MAPO [32] which employs 
frequent sequence mining [10] to identify common usage patterns. Although the 
latest version of the system outputs the API call sequences along with their asso- 
ciated snippets [33], it is still more of a sequence-based approach, as it presents 
the code of the client method without performing any summarization, while it 
also does not consider the structure of the source code snippets. 

Wang et al. [31] argue that MAPO outputs a large number of usage patterns, 
many of which are redundant. The authors therefore define scalability, succinct- 
ness and high-coverage as the required characteristics of an API miner and 
construct UP-Miner, a system that mines probabilistic graphs of API method 
calls and extracts more useful patterns than MAPO. However, the presentation 
of such graphs can be overwhelming when compared to ranked lists. 

Recently, Fowkes and Sutton [9] proposed a method for mining API usage 
patterns called PAM, which uses probabilistic machine learning to mine a less 
redundant and more representative set of patterns than MAPO or UP-Miner. 
This paper also introduced an automated evaluation framework, using handwrit- 
ten library usage examples from Github, which we adapt in the present work. 


2.2 Systems that Output Source Code Snippets 


A typical snippet mining system is eXoaDocs [13-15] that employs slicing tech- 
niques to summarize snippets retrieved from online sources into useful documen- 
tation examples, which are further organized using clustering techniques. How- 
ever, clustering is performed using semantic feature vectors approximated by 
the Deckard tool [12], and such features are not straightforward to get extracted 
for different programming languages. Furthermore, eXoaDocs only targets usage 
examples of single API methods, as its feature vectors do not include information 
for mining frequent patterns with multiple API method calls. 

APIMiner [20] introduces a summarization algorithm that uses slicing to 
preserve only the API-relevant statements of the source code. Further work by 
the same authors [4] incorporates association rule techniques, and employs an 
improved version of the summarization algorithm, with the aim of resolving 
variable types and adding descriptive comments. Yet the system does not cluster 
similar examples, while most examples show the usage of a single API method. 

Even when slicing is employed in the aforementioned systems, the examples 
often contain extraneous statements (i.e. statements that could be removed as 
they are not related to the API), as noted by Buse and Weimer [5]. Hence, 
the authors introduce a system that synthesizes representative and well-typed 
usage examples using path-sensitive data flow analysis, clustering, and pattern 
abstraction. The snippets are complete and abstract, including abstract naming 
and helpful code, such as try/catch statements. However, the sophistication of 
their program analysis makes the system more complex [31], and increases the 
required effort for applying it to new programming languages. 
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Allamanis and Sutton [1] present a system for mining syntactic idioms, which 
are syntactic patterns that recur frequently and are closely related to snippets, 
and thus many of their mined patterns are API snippets. That method is lan- 
guage agnostic, as it relies only on ASTs, but uses a sophisticated statistical 
method based on Bayesian probabilistic grammars, which limits its scalability. 

Although the aforementioned approaches can be effective in certain scenarios, 
they also have several drawbacks. First, most systems output API call sequences 
or other representations (e.g. call graphs), which may not be as helpful as snip- 
pets, both in terms of understanding and from a reuse perspective (e.g. adapting 
an example to fit one’s own code). Several of the systems that output snippets 
do not group them into clusters and thus they do not provide a diverse set of 
usage examples, and even when clustering is employed, the set of features may 
not allow extending the approaches in other programming languages. Finally, 
certain systems do not provide concise and readable snippets as their source 
code summarization capabilities are limited. 

In this work, we present a novel API usage mining system, CLAMS, to over- 
come the above limitations. CLAMS employs clustering to group similar snippets 
and the output examples are subsequently improved using a summarization algo- 
rithm. The algorithm performs heuristic transformations, such as variable type 
resolution and replacement of literals, while it also removes non-API statements, 
in order to output concise and readable snippets. Finally, the snippets are ranked 
in descending order of support and given along with comprehensive comments. 


3 Methodology 


3.1 System Overview 


The architecture of the system is shown in Fig. 1. The input for each library is a 
set of Client Files and the API of the library. The API Call Extractor generates 
a list of API call sequences from each method. The Clustering Preprocessor 
computes a distance matrix of the sequences, which is used by the Clustering 
Engine to cluster them. After that, the top (most representative) sequences from 
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Fig. 1. Overview of the proposed system. 
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each cluster are selected (Clustering Postprocessor). The source code and the 
ASTs (from the AST Extractor) of these top snippets are given to the Snippet 
Generator that generates a summarized snippet for each of them. Finally, the 
Snippet Selector selects a single snippet from each cluster, and the output is 
given by the Ranker that ranks the examples in descending order of support. 


3.2 Preprocessing Module 


The Preprocessing Module receives as input the client source code files and 
extracts their ASTs and their API call sequences. The AST Extractor employs 
srcML [8] to convert source code to an XML AST format, while the API Call 
Extractor extracts the API call sequences using the extractor provided by Fowkes 
and Sutton [9] which uses the Eclipse JDT parser to extract method calls using 
depth-first AST traversal. 


3.3 Clustering Module 


We perform clustering at sequence-level, instead of source code-level, this way 
considering all useful API information contained in the snippets. As an exam- 
ple, the snippets in Figs. 2a and b, would be clustered together by our Clustering 
Engine as they contain the same API call sequence. Given the large number and 
the diversity of the files, our approach is more effective than a clustering that 
would consider the structure of the client code, while such a decision makes the 
deployment to new languages easier. Note however that we take into considera- 
tion the structure of clustered snippets at a later stage (see Sect. 3.5). 


if (token != null) { 


editor.putString("", tkn.getToken{)); editor.putString("", token.g 
editor.putString("", tkniget TokenSecrett)); editor.putString("", tokensg 
} 


(a) (b) 


Fig. 2. The sample client code on the left side contains the same API calls with the 
client code on the right side, which are encircled in both snippets. 


Our clustering methodology involves first generating a distance matrix and 
then clustering the sequences using this matrix. The Clustering Preprocessor 
uses the Longest Common Subsequence (LCS) between any two sequences in 
order to compute their distance and then create the distance matrix. Given two 
sequences Sı and S2, their LCS distance is defined as: 


|LCS (S1, S2)| 
[Si] + |So| 


LCS dist (S1, S2) =1-2:- 
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where |.S;| and |S2]| are the lengths of Sı and S2, and |LC S (S1, S2)| is the length 
of their LCS. Given the distance matrix, the Clustering Engine explores the k- 
medoids algorithm which is based on the implementation provided by Bauckhage 
[3], and the hierarchical version of DBSCAN, known as HDBSCAN [7], which 
makes use of the implementation provided by McInnes et al. [17]. 

The next step is to retrieve the source code associated with the most rep- 
resentative sequence of each cluster (Clustering Postprocessor). Given, however, 
that each cluster may contain several snippets that are identical with respect to 
their sequences, we select multiple snippets for each cluster, this way retaining 
source code structure information, which shall be useful for selecting a single 
snippet (see Sect.3.5). Our analysis showed that selecting all possible snippets 
did not further improve the results, thus we select n snippets and set n to 5 for 
our experiments, as trying higher values would not affect the results. 


3.4 Snippet Generator 


The Snippet Generator generates a summarized version for the top snippets. 
Our summarization method, a static, flow-insensitive, intra-procedural slicing 
approach, is presented in Fig. 3. The input (Fig. 3, top left) is the snippet source 
code, the list of its invoked API calls and a set of variables defined in its outer 
scope (encircled and highlighted in bold respectively). 

At first, any comments are removed and literals are replaced by their srcML 
type, ie. string, char, number or boolean (Step 1). In Step 2, the algorithm 
creates two lists, one for API and one for non-API statements (highlighted in 
bold), based on whether an API method is invoked or not in each statement. Any 
control flow statements that include API statements in their code block are also 
retained (e.g. the else statement in Fig. 3). In Step 3, the algorithm creates a list 
with all the variables that reside in the local scope of the snippet (highlighted 
in bold). This is followed by the removal of all non-API statements (Step 4), by 
traversing the AST in reverse (bottom-up) order. 

In Step 5, the list of declared variables is filtered, and only those used in 
the summarized tree are retained (highlighted in bold). Moreover, the algorithm 
creates a list with all the variables that are declared in API statements and used 
only in non-API statements (encircled). In Step 6, the algorithm adds declara- 
tions (encircled) for the variables retrieved in Step 5. Furthermore, descriptive 
comments of the form “Do something with variable” (highlighted in bold) are 
added for the variables that are declared in API statements and used in non-API 
statements (retrieved also in Step 5). Finally, the algorithm adds “Do something” 
comments in any empty blocks (highlighted in italics). 

Finally, note that our approach is quite simpler than static, syntax preserving 
slicing. E.g., static slicing would not remove any of the statements inside the 
else block, as the call to the getFromUser API method is assigned to a variable 
(userName), which is then used in the assignment of user. Our approach, on the 
other hand, performs a single pass over the AST, thus ensuring lower complexity, 
which in its turn reduces the overall complexity of our system. 
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Summarizer Input Step 1: Preprocess comments and literals 


if (t.getCreatedAt().getTime() + number < mTime) { 
breakPaging = char; 
} else { 
userName = t.getFromUser().toLowerCase(); 
JUser user = userMap.get(userName); 
if (user == null) { 
user = new JUser(userName).init(t); 
userMap.put(userName, user); 


//TODO 
} else { 
userName = t.g $ 
JUser user = userMap.get(userName); 
if (user == null) { 
user = new JUser(userName).init(t); 
userMap.put(userName, user); 


} } 
} 
Step 2: Identify API statements Step 3: Retrieve local scope variables 
if (t.getCreatedAt().getTime() + number < mTime) { _ if (t.getCreatedAt().getTime() + number < mTime) { 
breakPaging = char; breakPaging = char; 
} else { } else { 
userName = t.getFromUser().toLowerCase(); userName = t.getFromUser().toLowerCase(); 
JUser user = userMap.get(userName); JUser user = userMap.get(userName); 
if (user == null) { if (user == null) { D 
user = new JUser(userName).init(t); user = new JUser(userName).init(t); 
userMap.put(userName, user); userMap.put(userName, user); 
} } 
} } 
Step 4: Remove non-API statements Step 6: Add declaration statements and comments 


if (t.getCreatedAt().getTime() + number < mTime) { 
} else { 

userName = t.getFromUser().toLowerCase(); 
} 


‘long mTime;: 


Step 5: Filtering variables if (t.getCreatedAt().getTime() + number < mTime) { 
// Do something 
if (t.getCreatedAt().getTime() + number < mTime) { } else { 
} else { userName = t.getFromUser().toLowerCase(); 


userName '= t.getFromUser().toLowerCase(); // Do something with userName 


} t 


Fig. 3. Example summarization of source code snippet. 


3.5 Snippet Selector 


The next step is to select a single snippet for each cluster. Given that the selected 
snippet has to be the most representative of the cluster, we select the one that 
is most similar to the other top snippets. The score between any two snippets is 
defined as the tree edit distance between their ASTs, computed using the AP- 
TED algorithm [21]. Given this metric, we create a matrix for each cluster, which 
contains the distance between any two top snippets of the cluster. Finally, we 
select the snippet with the minimum sum of distances in each cluster’s matrix. 


3.6 Ranker 


We rank the snippets according to the support of their API call sequences, as 
in [9]. In specific, if the API call sequence of a snippet is a subsequence of the 
sequence of a file in the repository, then we claim that the file supports the snippet. 
For example, the snippet with API call sequence [twitter4j.Status.getUser, twit- 
ter4j.Status.getText], is supported by a file with sequence [twitter4j.Paging.<init>, 
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twitter4j.Status.getUser, twitter4j.Status.getld, twitter4j.Status.getText, twitter4j. 
Status.getUser]. In this way, we compute the support for each snippet and create 
a complete ordering. Upon ordering the snippets, the AStyle formatter [2] is also 
used to fix the indentation and spacing. 


3.7 Deploying to New Languages 


Our methodology can be easily applied on different programming languages. The 
Preprocessing Module and the Snippet Selector make use of the source code’s 
AST, which is straightforward to extract in different languages. The Clustering 
Module and the Ranker use API call sequences and not any semantic features 
that are language-specific, while our summarization algorithm relies on state- 
ments and their control flow, a fundamental concept of imperative languages. 
Thus, extending our methodology to additional programming languages requires 
only the extraction of the AST of the source code, which can be done using appro- 
priate tools (e.g. sreML), and possibly a minor adjustment on our summarization 
algorithm to conform to the AST schema extracted from different tools. 


4 Evaluation 


4.1 Evaluation Framework 


We evaluate CLAMS on the APIs (all public methods) of 6 popular Java libraries, 
which were selected as they are popular (based on their GitHub stars and forks), 
cover various domains, and have handwritten examples to compare our snippets 
with. The libraries are shown in Table 1, along with certain statistics concerning 
the lines of code of their examples’ directories (Example LOC) and the lines of 
code considered from GitHub as using their API methods (Client LOC). 


Table 1. Summary of the evaluation dataset. 


Project Package Name Client LOC | Example LOC 
Apache Camel org.apache.camel | 141,454 15,256 
Drools org.drools 187,809 15,390 
Restlet Framework | org.restlet 208,395 41,078 
Twitter4j twitter4j 96,020 6,560 
Project Wonder com.webobjects | 375,064 37,181 
Apache Wicket org.apache.wicket | 564,418 33,025 


To further strengthen our hypothesis, we also employ an automated method 
for evaluating our system, to allow quantitative comparison of its different vari- 
ants. To assess whether the snippets of CLAMS are representative, we look for 
“gold standard” examples online, as writing our own examples would be time- 
consuming and lead to subjective results. 
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We focus our evaluation on the 4 research questions of Fig. 4. RQ1 and RQ2 
refer to summarization and clustering respectively and will be evaluated with 
respect to handwritten examples. For RQ3 we assess the API coverage achieved 
by CLAMS versus the ones achieved by the API mining systems MAPO [32,33] 
and UP-Miner [31]. RQ4 will determine whether the extra information of source 
code snippets when compared to API call sequences is useful to developers. 


RQ1: How much more concise, readable, and precise with respect to handwritten 
examples are the snippets after summarization? 

RQz2: Do more powerful clustering techniques, that cluster similar rather than identi- 
cal sequences, lead to snippets that more closely match handwritten examples? 

RQ3: Does our tool mine more diverse patterns than other existing approaches? 

RQA4: Do snippets match handwritten examples more than API call sequences? 


Fig. 4. Research Questions (RQs) to be evaluated. 


We consider four configurations for our system: NaiveNoSum, NaiveSum, 
KMedoidsSum, and HDBSCANSum. To reveal the effect of clustering sequences, 
the first two configurations do not use any clustering and only group identical 
sequences together, while the last two use the k-medoids and the HDBSCAN 
algorithms, respectively. Also the first configuration (NaiveNoSum) does not 
employ our summarizer, while all others do, so that we can measure its effect. 

We define metrics to assess the readability, conciseness, and quality of the 
returned snippets. For readability, we use the metric defined by Buse and Weimer 
[6] which is based on human studies and agrees with a large set of human anno- 
tators. Given a Java source code file, the tool provided by Buse and Weimer 
[27] outputs a value in the range [0.0, 1.0], where a higher value indicates a 
more readable snippet. For conciseness, we use the number of Physical Lines 
of Code (PLOCs). Both metrics have already been used for the evaluation of 
similar systems [5]. For quality, as a proxy measure we use the similarity of the 
set of returned snippets to a set of handwritten examples from the module’s 
developers. 

We define the similarity of a snippet s given a set of examples E as snippet 
precision. First, we define a set E, with all the examples in E that have exactly 
the same API calls with snippet s. After that, we compute the similarity of s 
with all matching examples e € E, by splitting the code into sets of tokens and 
applying set similarity metrics!. Tokenization is performed using a Java code 
tokenizer and the tokens are cleaned by removing symbols (e.g. brackets, etc.) 
and comments, and by replacing literals (i.e. numbers, etc.) with their respective 
types. The precision of s is the maximum of its similarities with all e € Es: 


1 Our decision to apply set similarity metrics instead of an edit distance metric is 
based on the fact that the latter one is heavily affected and can be easily skewed 
by the order of the statements in the source code (e.g. nested levels, etc.), while it 
would not provide a fair comparison between snippets and sequences. 
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where T, and Te are the set of tokens of the snippet s and of the example e, 
respectively. Finally, if no example has exactly the same API calls as the snippet 
(i.e. Es = Ø), then snippet precision is set to zero. Given the snippet precision, 


we also define the average snippet precision for n snippets $1, S2,...,Sn as: 
1 n 
AvgPrec(n) = — 5 Prec(s;) (3) 
rm 


Similarly, average snippet precision at top k can be defined as: 


AvgPrec@Qk = 


w| = 


k ig 

1 
X Prec@j where Prec@j = — J Prec(s;) (4) 
j=l 


i=l 


This metric is useful for evaluating our system which outputs ordered results, as 
it allows us to illustrate and draw conclusions for precision at different levels. 
We also define coverage at k as the number of unique API methods contained 
in the top k snippets. This metric has already been defined in a similar manner by 
Fowkes and Sutton [9], who claim that a list of patterns with identical methods 
would be redundant, non-diverse, and thus not representative of the target API. 
Finally, we measure additional information provided in source code snippets 
when compared with API call sequences. For each snippet we extract its snippet- 
tokens T,, as defined in (2), and its sequence-tokens T,', which are extracted by 
the underlying API call sequence of the snippet, where each token is the name 
of an API method. Based on these sets, we define the additional info metric as: 


l 1G maxen, {|Ts, N Tel} 
AdditInfo = = 5 (5) 


1 MaXec Es {Ts N Tel} 


i= 


where m is the number of snippets that match to at least one example. 


4.2 Evaluation Results 


RQ1: How much more concise, readable, and precise with respect to 
handwritten examples are the snippets after summarization? We eval- 
uate how much reduction in the size of the snippets is achieved by the summa- 
rization algorithm, and the effect of summarization on the precision with respect 
to handwritten examples. If snippets have high or higher precision after summa- 
rization, then this indicates that the tokens removed by summarization are ones 
that do not typically appear in handwritten examples, and thus are possibly less 
relevant. For this purpose, we use the first two versions of our system, namely the 
NaiveSum and the NaiveNoSum versions. Both of them use the naive clustering 
technique, where only identical sequences are clustered together. Figures 5a and 
b depict the average readability of the snippets mined for each library and the 
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Fig. 5. Figures of (a) the average readability, and (b) the average PLOC s of the snip- 
pets, for each library, with (NaiveSum) and without (NaiveNoSum) summarization. 


average PLOCs, respectively. The readability of the mined snippets is almost 
doubled when performing summarization, while the snippets generated by the 
NaiveSum version are clearly smaller than those mined by NaiveNoSum. In fact, 
the majority of the snippets of NaiveSum contain less than 10 PLOCs, owing 
mainly to the non-API statements removal of the algorithm. On average, the 
summarization algorithm leads to 40% fewer PLOCS. Thus, we may argue that 
the snippets provided by our summarizer are readable and concise. 

Apart from readability and conciseness, which are both regarded as highly 
desirable features [26], we further assess whether the summarizer produces snip- 
pets that closely match handwritten examples. Therefore, we plot the snippet 
precision at top k, in Fig. 6a. The plot indicates a downward trend in precision 
for both configurations, which is explained by the fact that the snippets of lower 
positions are more complex, as they normally contain a large number of API 
calls. In any case, it is clear that the version that uses the summarizer mines 
more precise snippets than the one not using it, for any value of k. E.g., for 
k = 10, the summarizer increases snippet precision from 0.27 to 0.35, indicating 
that no useful statements are removed and no irrelevant statements are added. 


RQ2: Do more powerful clustering techniques, that cluster similar 
rather than identical sequences, lead to snippets that more closely 
match handwritten examples? In this experiment we compare NaiveSum, 
KMedoidsSum, and HDBSCANSum to assess the effect of applying different 
clustering techniques on the snippets. In order for the comparison to be fair, we 
use the same number of clusters for both k-medoids and HDBSCAN. Therefore, 
we first run HDBSCAN (setting its min_cluster_size parameter to 2), and then 
use the number of clusters generated by the algorithm for k-medoids. After that, 
we consider the top k results of the three versions, so that the comparison with 
the Naive method (that cannot be tuned) is also fair. Hence, we plot precision 
against coverage, in a similar manner to precision versus recall graphs. For this 
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Fig. 6. Figures of (a) precision at top k, with (NaiveSum) or without (NaiveNoSum) 
summarization, and (b) the average interpolated snippet precision versus API coverage 
for three system versions (clustering algorithms), using the top 100 mined snippets. 


we use the snippet precision at k and coverage at k, while we make use of an 
interpolated version of the curve, where the precision value at each point is the 
maximum for the corresponding coverage value. Figure 6b depicts the curve for 
the top 100 snippets, where the areas under the curves are shaded. Area A2 
reveals the additional coverage in API methods achieved by HDBSCANSum, 
when compared to NaiveSum (A1), while A3 shows the corresponding additional 
coverage of KMedoidsSum, when compared to HDBSCANSum (A2). 
NaiveSum achieves slightly better precision than the versions using cluster- 
ing, which is expected as most of its top snippets use the same API calls, and 
contain only a few API methods. As a consequence, however, its coverage is 
quite low, due to the fact that only identical sequences are grouped together. 
Given that coverage is considered quite important when mining API usage exam- 
ples [31], and that precision among all three configurations is similar, we may 
argue that KMedoidsSum and HDBSCANSum produce sufficiently precise and 
also more varying results for the developer. The differences between these two 
methods are mostly related to the separation among the clusters; the clusters 
created by KMedoidsSum are more separated and thus it achieves higher cover- 
age, whereas HDBSCANSum has slightly higher precision. To achieve a trade-off 
between precision and coverage, we select HDBSCANSum for the last two RQs. 


RQ3: Does our tool mine more diverse patterns than other exist- 
ing approaches? For this research question, we evaluate the diversity of the 
examples of CLAMS to that of two API mining approaches, MAPO [32,33] and 
UP-Miner [31], which were deemed most similar to our approach from a mining 
perspective (as it also works at sequence level)?. We measure diversity using 
the coverage at k. Figure 7a depicts the coverage in API methods for each app- 
roach and each library, while Fig. 7b shows the average number of API methods 
covered at top k, using the top 100 examples of each approach. 


2 Comparing with other tools was also hard, as most are unavailable, such as, e.g., the 
eXoaDocs web app (http://exoa.postech.ac.kr/) or the APIMiner website (http:// 
java.labsoft.dcc.ufmg.br/apimineride/resources/docs/reference/). 
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Fig. 7. Graphs of the coverage in API methods achieved by CLAMS, MAPO, and UP- 
Miner, (a) for each project, and (b) on average, at top k, using the top 100 examples. 


The coverage by MAPO and UP-Miner is quite low, which is expected since 
both tools perform frequent sequence mining, thus generating several redundant 
patterns, a limitation noted also by Fowkes and Sutton [9]. On the other hand, 
our system integrates clustering techniques to reduce redundancy which is fur- 
ther eliminated by the fact that we select a single snippet from each cluster 
(Snippet Selector). Finally, the average coverage trend (Fig. 7b) indicates that 
our tool mines more diverse sequences than the other two tools, regardless of 
the number of examples. 


RQ4: Do source code snippets match handwritten examples more than 
API call sequences? Obviously source code snippets contain more tokens than 
API call sequences, but the additional tokens might not be useful. Therefore, we 
measure specifically whether the additional tokens that appear in snippets rather 
than sequences also appear in handwritten examples. Computing the average of 
the additional info metric for each library, we find that the average ratio between 
snippets-tokens and sequence-tokens, that are shared between snippets and cor- 
responding examples, is 2.75. This means that presenting snippets instead of 
sequences leads to 2.75 times more information. By further plotting the addi- 
tional information of the snippets for each library in Fig.8, we observe that 
snippets almost always provide at least twice as much valuable information. 
To further illustrate the contrast between snippets and sequences, we present 
an indicative snippet mined by CLAMS in Fig. 9. Note, e.g., how the try/catch 
tokens are important, however not included in the sequence tokens. 

Finally, we present the top 5 usage examples mined by CLAMS, MAPO and 
UP-Miner, in Fig. 10. As one may observe, snippets provide useful information 
that is missing from sequences, including identifiers (e.g. String secret), control 
flow statements (e.g. if-then-else statements), etc. Moreover, snippets are easier 
to integrate into the source code of the developer, and thus facilitate reuse. 


202 N. Katirtzis et al. 


10 


E Sequence-tokens 
Æ Additional snippet-tokens 


Average no. common tokens 


6 
4 
2 
0 


Apache Drools Restlet Twitter4j Project Apache 


Camel Framework 


Fig. 8. Additional information revealed when mining snippets instead of sequences. 


AccessToken accessToken; 
String oauthToken; 
String oAuthVerifier; 
Twitter twitter; 
try { 
accessToken = twitter.¢ 
// Do something with acces: 
} catch (TwitterException e) { 
mrintGtack Trace: p 


} 


Twitter mTwitter; 
mTwitter = new TwitterFactory().getInstance(); 
// Do something with mTwitter 


Twitter mTwitter; 

final String CONSUMER_KEY; 

final String CONSUMER_SECRET; 

mTwitter = new TwitterFactory().getinstance(); 

mTwitter.setOAuthConsumer(CONSUMER_KEY, 
CONSUMER_SECRET); 


BasicDBObject tweet; 

Status status; 

tweet.put(string, status.getUser().getScreenName()); 
tweet.put(string, status.getText()); 


String mConsumerKey; 

Twitter mTwitter; 

AccessToken mAccessToken; 

String mSecretKey; 

if (mAccessToken != null) { 
mtTwitter.setOAuthConsumer(mConsumerKey, mSecretKey); 
mTwitter.setOAuthAccessToken(mAccessToken); 


} 


Twitter mTwitter; 

String token; 

String secret; 

AccessToken at = new AccessToken(token, secret); 
mtTwitter.setOAuthAccessToken(at); 


(a) 


Wonder Wicket 


Fig. 9. Example snippet matched to handwritten example. Sequence-tokens are encir- 
cled and additional snippet-tokens are highlighted in bold. 


TwitterFactory. <init> 
TwitterFactory.getInstance 


Status.getUser 
Status.getText 


ConfigurationBuilder. <init> 
ConfiguratiorBuilder. build 


ConfigurationBuilder. <init> 
TwitterFactory. <init> 


ConfigurationBuilder. <init> 


ConfigurationBuilder.setOAuthConsumerKey 


(b) 


TwitterFactory.getInstance 
Twitter.setOAuthConsumer 


TwitterFactory. <init> 
TwitterFactory.getInstance 
Twitter.setOAuthConsumer 


Status.getUser 
Status.getUser 


ConfigurationBuilder. <init> 
ConfigurationBuilder. build 
TwitterFactory. <init> 


ConfigurationBuilder. <init> 
ConfigurationBuilder. build 
TwitterFactory. <init> 
TwitterFactory.getInstance 


(c) 


Fig. 10. Top 5 usage examples mined by (a) CLAMS, (b) MAPO, and (c) UP-Miner. 
The API methods for the examples of our system are highlighted. 
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Interestingly, the snippet ranked second by CLAMS has not been matched to 
any handwritten example, although it has high support in the dataset. In fact, 
there is no example for the setOauthConsumer method of Twitter4J, which is one 
of its most popular methods. This illustrates how CLAMS can also extract snip- 
pets beyond those of the examples directory, which are valuable to developers. 


5 Threats to Validity 


The main threats to validity of our approach involve the choice of the evaluation 
metrics and the lack of comparison with snippet-based approaches. Concerning 
the metrics, snippet API coverage is typical when comparing API usage mining 
approaches. On the other hand, the choice of metrics for measuring snippet 
quality is indeed a subjective criterion. To address this threat, we have employed 
three metrics, for the conciseness (PLOCs), readability, and quality (similarity to 
real examples). Our evaluation indicates that CLAMS is effective on all of these 
axes. In addition, as these metrics are applied on snippets, computing them 
for sequence-based systems such as MAPO and UP-Miner was not possible. 
Finally, to evaluate whether CLAMS can be practically useful when developing 
software, we plan to conduct a developer survey. To this end, we have already 
performed a preliminary study on a team of 5 Java developers of Hotels.com, the 
results of which were encouraging. More details about the study can be found 
at https: //mast-group.github.io/clams/user-survey/ (omitted here due to space 
limitations). 

Concerning the comparison with current approaches, we chose to compare 
CLAMS against sequence-based approaches (MAPO and UP-Miner), as the min- 
ing methodology is actually performed at sequence level. Nevertheless, compar- 
ing with snippet-based approaches would also be useful, not only as a proof of 
concept but also because it would allow us to comparatively evaluate CLAMS 
with regard to the snippet quality metrics mentioned in the previous paragraph. 
However, such a comparison was troublesome, as most current tools (including 
e.g., eXoaDocs, APIMiner, etc.) are currently unavailable (see RQ3 of Sect. 4.2). 
We may however note this comparison as an important point for future work, 
while we also choose to upload our code and findings online (https://mast- 
group.github.io/clams/) to facilitate future researchers that may face similar 
challenges. 


6 Conclusion 


In this paper we have proposed a novel approach for mining API usage examples 
in the form of source code snippets, from client code. Our system uses clustering 
techniques, as well as a summarization algorithm to mine useful, concise, and 
readable snippets. Our evaluation shows that snippet clustering leads to better 
precision versus coverage rate, while the summarization algorithm effectively 
increases the readability and decreases the size of the snippets. Finally, our tool 
offers diverse snippets that match handwritten examples better than sequences. 
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In future work, we plan to extend the approach used to retrieve the top mined 
sequences from each cluster. We could use a two-stage clustering approach where, 
after clustering the API call sequences, we could further cluster the snippets of 
the formed clusters, using a tree edit distance metric. This would allow retrieving 
snippets that use the same API call sequence, but differ in their structure. 
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Abstract. In 2011, Danicic et al. introduced an elegant generalization 
of the notion of control dependence for any directed graph. They also 
proposed an algorithm computing the weak control-closure of a subset 
of graph vertices and performed a paper-and-pencil proof of its correct- 
ness. We have performed its proof in the Coq proof assistant. This paper 
also presents a novel, more efficient algorithm to compute weak control- 
closure taking benefit of intermediate propagation results of previous iter- 
ations in order to accelerate the following ones. This optimization makes 
the design and proof of the algorithm more complex and requires subtle 
loop invariants. The new algorithm has been formalized and mechan- 
ically proven in the Why3 verification tool. Experiments on arbitrary 
generated graphs with up to thousands of vertices demonstrate that the 
proposed algorithm remains practical for real-life programs and signifi- 
cantly outperforms Danicic’s initial technique. 


1 Introduction 


Context. Control dependence is a fundamental notion in software engineering 
and analysis (e.g. [6, 12, 13,21,22,27]). It reflects structural relationships between 
different program statements and is intensively used in many software analysis 
techniques and tools, such as compilers, verification tools, test generators, pro- 
gram transformation tools, simulators, debuggers, etc. Along with data depen- 
dence, it is one of the key notions used in program slicing [25,27], a program 
transformation technique allowing to decompose a given program into a simpler 
one, called a program slice. 

In 2011, Danicic et al. [11] proposed an elegant generalization of the notions of 
closure under non-termination insensitive (weak) and non-termination sensitive 
(strong) control dependence. They introduced the notions of weak and strong 
control-closures, that can be defined on any directed graph, and no longer only 
on control flow graphs. They proved that weak and strong control-closures sub- 
sume the closures under all forms of control dependence previously known in 
the literature. In the present paper, we are interested in the non-termination 
insensitive form, i.e. weak control-closure. 


© The Author(s) 2018 
A. Russo and A. Schirr (Eds.): FASE 2018, LNCS 10802, pp. 207-224, 2018. 
https: //doi.org/10.1007/978-3-319-89363-1_12 


208 J.-C. Léchenet et al. 


Besides the definition of weak control-closure, Danicic et al. also provided 
an algorithm computing it for a given set of vertices in a directed graph. This 
algorithm was proved by paper-and-pencil. Under the assumption that the given 
graph is a CFG (or more generally, that the maximal out-degree of the graph 
vertices is bounded), the complexity of the algorithm can be expressed in terms 
of the number of vertices n of the graph, and was shown to be O(n?). Danicic 
et al. themselves suggested that it should be possible to improve its complexity. 
This may explain why this algorithm was not used until now. 


Motivation. Danicic et al. introduced basic notions used to define weak control- 
closure and to justify the algorithm, and proved a few lemmas about them. While 
formalizing these concepts in the Coq proof assistant [5,24], we have discovered 
that, strictly speaking, the paper-and-pencil proof of one of them [11, Lemma 
53] is inaccurate (a previously proven case is applied while its hypotheses are not 
satisfied), whereas the lemma itself is correct. Furthermore, Danicic’s algorithm 
does not take advantage of its iterative nature and does not reuse the results of 
previous iterations in order to speed up the following ones. 


Goals. First, we fully formalize Danicic’s algorithm, its correctness proof and 
the underlying concepts in Coq. Our second objective is to design a more effi- 
cient algorithm sharing information between iterations to speed up the execution. 
Since our new algorithm is carefully optimized and more complex, its correct- 
ness proof relies on more subtle arguments than for Danicic’s algorithm. To 
deal with them and to avoid any risk of error, we have decided again to use 
a mechanized verification tool — this time, the Why3 proof system [1,14] — to 
guarantee correctness of the optimized version. Finally, in order to evaluate the 
new algorithm with respect to Danicic’s initial technique, we have implemented 
both algorithms in OCaml (using OCamlgraph library [9]) and tested them on a 
large set of randomly generated graphs with up to thousands of vertices. Experi- 
ments demonstrate that the proposed optimized algorithm is applicable to large 
graphs (and thus to CFGs of real-life programs) and significantly outperforms 
Danicic’s original technique. 


Contributions. The contributions of this paper include: 


— A formalization of Danicic’s algorithm and proof of its correctness in Coq; 

— A new algorithm computing weak control-closure and taking benefit from 
preserving some intermediary results between iterations; 

— A mechanized correctness proof of this new algorithm in the Why3 tool includ- 
ing a formalization of the basic concepts and results of Danicic et al.; 

— An implementation of Danicic’s and our algorithms in OCaml, their evalua- 
tion on random graphs and a comparison of their execution times. 


The Coq, Why3 and OCaml implementations are all available in [17]. 


Outline. We present our motivation and a running example in Sect. 2. Then, 
we recall the definitions of some important concepts introduced by [11] in Sect. 3 
and state two important lemmas in Sect. 4. Next, we describe Danicic’s algorithm 
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in Sect. 5 and our algorithm along with a sketch of the proof of its correctness 
in Sect.6. Experiments are presented in Sect. 7. Finally, Sect. 8 presents some 
related work and concludes. 


2 Motivation and Running Example 


This section informally presents weak control-closure using a running example. 

The inputs of our problem are a directed graph 
G = (V, E) with set of vertices (or nodes) V and set 
of edges E, and a subset of vertices V’ C V. The 
property of interest of such a subset is called weakly 
control-closed in [11] (cf. Definition 3). V” is said to 
be weakly control-closed if the nodes reachable from 
V’ are V’-weakly committing (cf. Definition 2), i.e. 
always lead the flow to at most one node in V’. 
Since V’ does not necessarily satisfy this property, 
we want to build a superset of V’ satisfying it, and 
more particularly the smallest one, called the weak 
control-closure of V’ in G (cf. Definition 5). For that, Fig. 1. Example graph Go, 
as it will be proved by Lemma 2, we need to add with Vj = {u1, us} 
to V’ the points of divergence closest to V’, called 
the V’-weakly deciding vertices, that are reachable from V’. Formally, vertex u 
is V’-weakly deciding if there exist two non-trivial paths starting from u and 
reaching V’ that have no common vertex except u (cf. Definition 4). 

Let us illustrate these ideas on an example graph Gp shown in Fig. 1. Vj = 


{u1, uz} is the subset of interest represented with dashed double circles ((“))) in 
Fig. 1. us is reachable from Vj and is not Vj-weakly committing, since it ‘is the 
origin of two paths us, ug, uo, u1 and us, Ug, Uo, U2, uz that can lead the flow to 
two different nodes u; and us in Vj. Therefore, Vj is not weakly control-closed. 
To build the weak control-closure, we need to add to Vj all Vj-weakly deciding 
nodes reachable from Vj. uo is such a node. Indeed, it is reachable from Vj and 
we can build two non-trivial paths uo, uw; and up, u2, ug starting from ug, ending 
in Vj (respectively in u; and ug) and sharing no other vertex than wo. Similarly, 
nodes ug, u4 and ug must be added as well. On the contrary, us must not be 
added, since every non-empty path starting from us has ug as second vertex. 
More generally, a node with only one child cannot be a “divergence point closest 
to V’” and must never be added to build the weak control-closure. The weak 
control-closure of Vj in Go is thus {uo, u1, U2, U3, U4, Ug}. 

To build the closure, Danicic’s algorithm, like the one we propose, does not 
directly try to build the two paths sharing only one node. Both algorithms rely 
on a concept called observable vertex. Given a vertex u € V, the set of observable 
vertices in V’ from u contains all nodes reachable from u in V’ without using 
edges starting in V’. The important property about this object is that, as it 
will be proved by Lemma 4, if there exists an edge (u,v) € E such that u is 
not in V’, u is reachable from V’, v can reach V’ and there exists a vertex w 
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{u1, u3} {u1, u3} {uo, wa} {uo, ua} {us} {ue} 


Lua) ) {us} (ua) us) {us 
(ur, us} - {ua} 0 — {ua} L — 
(a) w.r.t. Vo = {u1, us} (b) w.r.t. Vol = Vo u {uo,u2,u4} (c) wrt. Vo” = W u {ue} 


Fig. 2. Example graph Go annotated with observable sets 


observable from u but not from v, then u must be added to V’ to build the 
weak control-closure. Figure 2a shows our example graph Go, each node being 
annotated with its set of observables in Vg. 

(uo, U1) is an edge such that uo is reachable from Vj, u1 can reach Vj and ug 
is an observable vertex from uo in Vg but not from u1. uo is thus a node to be 
added in the weak control-closure. Likewise, from the edges (u2, ug) and (u4, us), 
we can deduce that uz and u4 belong to the closure. However, we have seen that 
ug belongs to the closure, but it is not possible to apply the same reasoning 
to (us, uo), (ug, u4) or (ug, us). We need another technique. As Lemma 3 will 
establish, the technique is actually iterative. We can add to the initial Vj the 
nodes that we have already detected and apply our technique to this new set Vj’. 
The vertices that will be detected this way will also be in the closure of the initial 
set Vj. The observable sets w.r.t. to Vj’ = Vj U {uo, u2, u4} are shown in Fig. 2b. 
This time, both edges (ug,u4) and (ug, uo) allow us to add ug to the closure. 
Applying again the technique with the augmented set Vj” = Vo’ U {ue} (cf. 
Fig. 2c) does not reveal new vertices. This means that all the nodes have already 
been found. We obtain the same set as before for the weak control-closure of Vj, 


Le. {uo, U1, U2, Ug, U4, ug}. 


3 Basic Concepts 


This section introduces basic definitions and properties needed to define the 
notion of weak control-closure. They have been formalized in Coq [17], including 
in particular Property 3 whose proof in [11] was inaccurate. 

From now on, let G = (V, E) denote a directed graph, and V’ a subset of 


V. We define a path in G in the usual way. We write u Path. o if there exists a 


path from u to v. Let Ro(V’) = {ve V| due V’, u ze v} be the set of nodes 
reachable from V’. In our example (cf. Fig. 1), ue, uo, u1, ug is a (4-node) path 
in Go, u1 is a trivial one-node path in Go from u; to itself, and Re,(Vj) = Vo. 
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Definition 1 (V’-disjoint, V’-path). A path x in G is said to be V’-disjoint 
in G if all the vertices in n but the last one are not in V’. A V'-path in G is 
a V'-disjoint path whose last vertex is in V’. In particular, if u € V’, the only 
V'-path starting from u is the trivial path u. 

5 V'—path ` : chia 

We write u v (resp. u ————> v) if there exists a V’-disjoint path 
(resp. a V’-path) from u to v. 


V'—disjoint 


Example. In Go, ug; U2,U3} Uo,U1; Uo, U2, Ug are Vj-paths and thus Vj-disjoint 
paths. us, uo is a Vj-disjoint path but not a Vj-path. 


Remark 1. Definition 1 and the following ones are slightly different from [11], 
where a V’-path must contain at least two vertices and there is no constraint 
on its first vertex, which can be in V’ or not. Our definitions lead to the same 
notion of weak control-closure. 


Definition 2 (V’-weakly committing vertex). A verter u in G is V'-weakly 
committing if all the V'-paths from u have the same end point (in V'). In par- 
ticular, any vertex u € V’ is V'-weakly committing. 


Example. In Go, u and ug are the only Vj-weakly committing nodes. 


Definition 3 (Weakly control-closed set). A subset V’ of V is weakly 
control-closed in G if every vertex reachable from V' is V'-weakly committing. 


Example. Since in particular u2 is not Vj-weakly committing and reachable from 
Vj, Vo is not weakly control-closed in Go. Ø, singletons and the set of all nodes Vo 
are trivially weakly control-closed. Less trivial weakly control-closed sets include 
{uo, ur}, {ua, us, ug} and {up, U1, U2, Us, U4, Ug}. 


Definition 3 characterizes a weakly control-closed set, but does not explain 
how to build one. It would be particularly interesting to build the smallest weakly 
control-closed set containing a given set V’. The notion of weakly deciding vertex 
will help us to give an explicit expression to that set. 


Definition 4 (V’-weakly deciding vertex). A verter u is V’-weakly decid- 
ing if there exist at least two non-trivial V'-paths from u that share no vertex 
except u. Let WDg(V’) denote the set of V’-weakly deciding vertices in G. 


Property 1. If u € V’, then u ¢ WDg(V’) (by Definitions 1, 4). 


Example. In Go, by Property 1, u1,u3 € WDG, (Vo). We have illustrated the 
definition for nodes uo and us in Sect. 2. We have WDg,(Vj) = {uo, u2, u4, ug}. 


Lemma 1 (Characterization of being weakly control-closed). V’ is 
weakly control-closed in G if and only if there is no V'-weakly deciding vertex in 
G reachable from V”. 
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Example. In Go, uz is reachable from Vj and is Vj-weakly deciding. This gives 
another proof that Vj is not weakly control-closed. 

Here are two other useful properties of WDg. 


Property 2. Y V, V} CV, Vi CV} = WDa(V{) C V} UWDe (V3) 


Property 3. WDg(V’ UWDg(V’)) = Ø. 


We can prove that adding to a given set V’ the V’-weakly deciding nodes 
that are reachable from V’ gives a weakly control-closed set in G. This set is the 
smallest superset of V’ weakly control-closed in G. 


Lemma 2 (Existence of the weak control-closure). Let W =WDg¢(V')N 
Re(V’) denote the set of vertices in WDg(V’) that are reachable from V’. Then 
V' UW is the smallest weakly control-closed set containing V’. 


Definition 5 (Weak control-closure). We call weak control-closure of V’, 
denoted WCCc(V’), the smallest weakly control-closed set containing V”. 


Property 4. Let V’, Vi and V3 be subsets of V. Then 


(a) WCCe(V’) = V’ U (WDeG(V’) NA Re(V’)) = (V' UWDe(V’)) A Re (V’). 
(b) If Vi C V3, then WCCe(V/) C WCCe (V3). 

(c) If V’ is weakly control-closed, then WCC (V’) = V”. 

(d) WCCe(WCCe(V’)) = WCCe( V’). 


4 Main Lemmas 


This section gives two lemmas used to justify both Danicic’s algorithm and ours. 


Lemma 3. Let V’ and W be two subsets of V. If V' CW C V’ UWDG(V’), 
then W UWDe(W) = V’ UWDe(V’). If moreover V' C W C WCCa(V'), then 
WCCo(W) = WCC (V’). 


Proof. Assume V’ C W C V’/UWDG(V’). Since V’ C W, we have by Property 2, 
WDc(V’) C WUWDe(W). Moreover, W C V’ UWDG(V’), thus WDg(W) C 
V’ UWDc(V’) U WDg(V’' UWDg(V’)) by Property 2, hence WDg(W) C V’ U 
WDa(V’) by Property 3. These inclusions imply WUWDe¢(W) = V’UWDe(V’). 

If now V’ CW C WCCe(V’), we deduce WCCe(W) = WCCe(V’) from the 
previous result by intersecting with Rg(V’) by Property 4a. 


Lemma 3 allows to design iterative algorithms to compute the closure. Indeed, 
assume that we have a procedure which, for any non-weakly control-closed set 
V’, can return one or more elements of the weak control-closure of V’ not in 
V’. If we apply such a procedure to V’ once, we get a set W that satisfies 
V’ CW CWCCe(V’). From Lemma 3, WCCe(W) = WCCe(V’). To compute 
the weak control-closure of V’, it is thus sufficient to build the weak control- 
closure of W. We can apply our procedure again, this time to W, and repetitively 
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on all the successively computed sets. Since each set is a strict superset of the 
previous one, this iterative procedure terminates because graph G is finite. 

Before stating the second lemma, we introduce a key concept. It is called O 
n [11]. We use the name “observable” as in [26]. 


Definition 6 (Observable). Let u € V. The set of observable vertices from u 


in V', denoted obsg(u, V’), is the set of vertices u’ in V’ such that u LEEN gj, 
Remark 2. A vertex u € V” is its unique observable: obsg(u, V’) = {u}. 
The concept of observable set was illustrated in Fig. 2 (cf. Sect. 2). 


Lemma 4 (Sufficient condition for being V’-weakly deciding). Let (u,v) 
be an edge in G such that u Z V', v can reach V' and there exists a vertex u’ in 
V’ such that u’ € obse(u, V’) and u’ ¢ obse(v, V’). Then u e WDe(V’). 


Proof. We need to exhibit two V’-paths from u ending in V’ that share no vertex 
except u. We take the V’-path from u to u’ as the first one, and a V’-path 
connecting u to V’ through v as the second one (we construct it by prepending 
u to the smallest prefix of the path from v ending in V’ which is a V’-path). If 
these V’-paths intersected at a node y different from u, we would have a V’-path 
from v to u’ by concatenating the paths from v to y and from y to u’, which is 
contradictory. 


Example. In Go, obsa, (uo, Vj) = {u1, u3} and obsa, (ui, Vo) = {ur} (cf. Fig. 2a). 
Since w is a child of uo, we can apply Lemma 4, and deduce that uo is Vj-weakly 
deciding. obsg,(us,Vg) = {u1, u3} and obsc, (us, Vo) = {u1, u3}. We cannot 
apply Lemma 4 to us, and for good reason, since us is not Vj-weakly deciding. 
But we cannot apply Lemma 4 to ug either, since ug and all its children uo, u4 and 
us have observable sets {u1, u3} w.r.t. Vj, while us is Vo-weakly deciding. This 
shows that with Lemma 4, we have a sufficient condition, but not a necessary 
one, for proving that a vertex is weakly deciding. 


Example. Let us apply Algorithm 1 to our running example Go (cf. Fig. 1). 
Initially, Wo = VG = {u1, us}. 


1. obsa, (uo, Wo) = {u1, u3} and obse,(ui,Wo) = {ui}, therefore (uo, u1) is a 
Wo-critical edge. Set W1 = {uo, u1, us}. 

2. obsg,(u2,W1) = {uo, u3} and obsg,(u3,Wi) = {u3}, therefore (u2, us) is a 
W\-critical edge. Set W2 = {uo, u1, u2, us}. 

3. obsg, (u4, W2) = {uo, u3} and obsg,(u3,W2) = {u3}, therefore (u4, u3) is a 
Wo-critical edge. Set W3 = {uo, u1, U2, Us, U4}. 

4. obsa, (ue, W3) = {uo, u4} and obsg, (uo, W3) = {uo}, therefore (ug, uo) is a 
W3-critical edge. Set W4 = {ug, U1, U2, U3, U4, Ug} 

5. There is no W4-critical edge. WCC, (Vg) = Wa = {uo, u1, U2, Us, U4, U6}. 
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Input: G = (V, E) a directed graph 
V'CV 
Output: W C V the weak control-closure of V’ 
Ensures: W = WCCc(V’) 
begin 
W-vV’' 
while there exists a W -critical edge in E do 
choose such a W-critical edge (u, v) 
W—WU{u} 
end 
return W 


oOarnoaa4»rwnd 


end 
Algorithm 1. Danicic’s original algorithm for weak control-closure [11] 


5 Danicic’s Algorithm 


We present here the algorithm described in [11]. This algorithm and a proof of its 
correctness have been formalized in Coq [17]. The algorithm is nearly completely 
justified by a following lemma (Lemma 5, equivalent to [11, Lemma 60)}). 

We first need to introduce a new concept, which captures edges that are of 
particular interest when searching for weakly deciding vertices. This concept is 
taken from [11], where it was not given a name. We call such edges critical edges. 


Definition 7 (Critical edge). An edge (u,v) in G is called V'-critical if: 


(1) | obsg(u, V’)| > 2; 
(2) | obsg(v, V’)| = 1; 
(3) u is reachable from V' in G. 


Example. In Go, (uo, u1), (u2,u3) and (u4, ug) are the Vj-critical edges. 


Lemma 5. If V’ is not weakly control-closed in G, then there exists a V'- 
critical edge (u,v) in G. Moreover, if (u,v) is such a V'-critical edge, then 
u E€ WDg(V')N Re (V’), therefore u € WCCe(V’). 


Proof. Let x be a vertex in WDg(V’) reachable from V’. There exists a V’-path m 
from x ending in x’ € V”. It follows that | obs¢(a, V’)| > 2 and | obsg(a2’, V’)| = 1. 
Let u be the last vertex on 7 with at least two observable nodes in V’ and v its 
successor on 7. Then (u,v) is a V’-critical edge. 

Assume there exists a V’-critical edge (u,v). Since |obs¢(u,V’)| > 2 and 
|obsg(v, V’)| = 1, u ¢ V’, v can reach V’ and there exists u’ in obsg(u, V’) but 
not in obsc(v, V’). By Lemma 4, u € WDg(V’) and thus u E€ WCCe¢(V’). 


Remark 3. We can see in the proof above that we do not need the exact values 
2 and 1. We just need strictly more observable vertices for u than for v and at 
least one observable for v, to satisfy the hypotheses of Lemma 4. 


As described in Sect. 4, we can build an iterative algorithm constructing the 
weak control-closure of V’ by searching for critical edges on the intermediate sets 
built successively. This is the idea of Danicic’s algorithm shown as Algorithm 1. 
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Proof of Algorithm 1. To establish the correction of the algorithm, we can prove 
that W;, the value of W before iteration i+ 1, satisfies both V’ C W; and 
W; C WCCe(V’) for any i by induction. If i = 0, Wọ = V’, and both relations 
trivially hold. Let i be a natural number such that V’ C W;, Wi C WCCe(V’) 
and there exists a W;j-critical edge (u,v). We have Wi, = W: U {u}. V’ C 
Wi+1 is straightforward. By Lemma 5, u E€ WCCg(W;). Therefore, by Lemma 
3, u E€ WCCe(V’), and thus, Wi41 C WCCe(V’). At the end of the algorithm, 
there is no W-critical edge, therefore W is weakly control-closed by Lemma 5. 
Since V’ C W and W C WCC@(V’), W = WCCe(V’) by Lemma 3. Termination 
follows from the fact that W strictly increases in the loop and is upper-bounded 
by WCCe(V’). 
In terms of complexity, [11] shows that, assuming that the degree of each 
vertex is at most 2 (and thus that O(|V|) = O(|E]|)), the complexity of the 
algorithm is O(|V |3). Indeed, the main loop of Algorithm 1 is run at most O(|V 
times, and each loop body computes obs in O(|V]|) for at most O(|V |) edges. 


NN 


Remark 4. We propose two optimizations for Algorithm 1: 


— at each step, consider all critical edges rather than only one; 
— use the weaker definition of critical edge suggested in Remark 3. 


Example. We can replay Algorithm 1 using the first optimization. This run cor- 
responds to the steps shown in Fig. 2. Initially, Wo = Vj = {w1, us}. 


1. (uo, U1), (u2, U3), (u4, u3) are Wo-critical edges. Set Wy = {uo, u1, U2, U3, U4}. 
2. (ug, Uo) is a Wy-critical edge. Set W2 = {uo, u1, U2, Us, Ua, UG}. 
3. There is no W2-critical edge in Go. 


The optimized version computes the weak control-closure of Vj in Go in only 2 
iterations instead of 4. This run also demonstrates that the algorithm is neces- 
sarily iterative: even when considering all Vj-critical edges in the first step, ug 
is not detected before the second step. 


6 The Optimized Algorithm 


Overview. A potential source of inefficiency in Danicic’s algorithm is the fact 
that no information is shared between the iterations. The observable sets are 
recomputed at each iteration since the target set changes. This is the reason 
why the first optimization proposed in Remark 4 is interesting, because it allows 
to work longer on the same set and thus to reuse the observable sets. 

We propose now to go even further: to store some information about the 
paths in the graph and reuse it in the following iterations. The main idea of 
the proposed algorithm is to label each processed node u with a node v € W 
observable from u in the resulting set W being progressively constructed by the 
algorithm. Labels survive through iterations and can be reused. 

Unlike Danicic’s algorithm, ours does not directly compute the weak control- 
closure. It actually computes the set W = V’UWDg(V’). To obtain the closure 
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WCCe(V’) = WNRe(V’), W is then simply filtered to keep only vertices reach- 
able from V’ (cf. Property 4a). 

In addition to speeding up the algorithm, the usage of labels brings another 
benefit: for each node of G, its label indicates its observable vertex in W (when 
it exists) at the end of the algorithm. Recall that since WDg(W) = © (by 
Property 3), each node in the graph has at most one observable vertex in W. 

One difficult point with this approach is that the labels of the nodes need to 
be refreshed with care at each iteration so that they remain up-to-date. Actually, 
our algorithm does not ensure that at each iteration the label of each node is 
an observable vertex from this node in W. This state is only ensured at the 
beginning and at the end of the algorithm. Meanwhile, some nodes are still in 
the worklist and some labels are wrong, but this does not prevent the algorithm 
from working. 


Informal Description. Our algorithm is given a directed graph G and a subset 
of vertices V’ in G. It manipulates three objects: a set W which is equal to V’ 
initially, which grows during the algorithm and which at the end contains the 
result, V’' UWDc¢(V’); a partial mapping obs associating at most one label obs[u] 
to each node u in the graph, this label being a vertex in W reachable from this 
node (and which is the observable from u in V'UWDg(V”) at the end); a worklist 
L of nodes of the closure not processed yet. Each iteration proceeds as follows. 
If the worklist is not empty, a vertex u is extracted from it. All the vertices that 
transitively precede vertex u in the graph and that are not hidden by vertices in 
W are labeled with u. During the propagation, nodes that are good candidates 
to be V’-weakly deciding are accumulated. After the propagation, we filter them 
so that only true V’-weakly deciding nodes are kept. Each of these vertices is 
associated to itself in obs, and is added to W and L. If L is not empty, a new 
iteration begins. Otherwise, W is equal to V’' UWDg(V’) and obs associates each 
node in the graph with its observable vertex in the closure (when it exists). 

Note that each iteration consists in two steps: a complete backward propa- 
gation in the graph, which collects potential V’-weakly deciding vertices, and a 
filtering step. The set of predecessors of the propagated node are thus filtered 
twice: once during the propagation and once afterwards. We can try to filter as 
much as possible in the first step or, at the opposite, to avoid filtering during 
the first step and do all the work in the second step. For the sake of simplicity 
of mechanized proof, the version we chose does only simple filtering during the 
first step. We accumulate in our candidate V’-weakly deciding nodes all nodes 
that have at least two children and a label different from the one currently 
propagated, and we eliminate the false positives in the second step, once the 
propagation is done. 


Example. Let us use our running example (cf. Fig. 1) to illustrate the algorithm. 
The successive steps are represented in Fig.3. In the different figures, nodes in 
W already processed (that is, in W\ L) are represented using a solid double circle 


O, while nodes in W not already processed (that is, still in worklist L) are 


represented using a dashed double circle (CÒ). A label uj next to a node ui 
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U3 u3 uo uo 


us 


(c) After propagation of uo 


(d) After propagation of u2 (e) After propagation of u4 (£) After propagation of ue 
Fig. 3. The optimized algorithm applied on Go, where V’ = {u1, us} 


(CDu) means that uj is associated to u;, i.e. obs[u;] = uj. Let us detail the 
first steps of the algorithm. Initially, Wo = Vg = {u1, us} (cf. Fig. 1). 


1. uz is selected and is propagated backwards from u; (cf. Fig. 3a). We find 
no candidate, the first iteration is finished, W1 = {u1, us}. 

2. ug is selected and is propagated backwards from ug (cf. Fig. 3b). uo, u2, u4 
and ug are candidates, but only uo is confirmed as a Vj-weakly deciding 
node. It is stored in worklist L and its label is set to wo. Now Wa = 
{uo, U1; uz}. 

3-6. uo, u2, u4 and ug are processed similarly (cf. Fig. 3c, d, e, f). At the end, 
we get We = {uo, u1, u2, Us, U4, Ue} = Vo U WDg( Vg). 


As all nodes in Wg are already reachable from Vj, We = WCCe (V3). 

We can make two remarks on this example. First, as we can see in Fig. 3f, 
each node is labeled with its observable in W at the end of the algorithm. Second, 
in Fig. 3e, we have the case of a node labeled with an obsolete label, since us is 
labeled u4 while its only observable node in W is ug. 


Detailed Description. Our algorithm is split into three functions: 


— confirm is used to check if a given node is V’-weakly deciding by trying to 
find a child with a different label from its own label given as an argument. 
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Input: G = (V, E) a directed graph 
obs : Map(V,V) associating at most one label to each vertex of G 
u,v € V vertices in G 
Output: b : bool 
Ensures: b = true = > du’,(u,u’) E€ E Aw € obs A obs{u'] # v 
Algorithm 2. Contract of confirm (G, obs, u, v) 


Input: G = (V, E), W CV, obs: Map(V,V), u,v E€ V 
Output: obs’, a new version of obs 
C C V containing candidate W-weakly deciding nodes 


Requires: (P1) Vz € V,obs[z] =v 4> z=u 

Requires: (P2) u € W 

Ensures: (Qi) Yz E€ V, z Meni obs'[z] = v 
Ensures: (Q2) Yz € V,7(z Mpm u) obs'[z] = obs[z] 
Ensures: (Qs) Vz € C,z#u^z —— 

Ensures: (Q4) Vz E V,z AuAz WPAP, U Az € obs 


A|succe(z)| > 1 => zE C 
Algorithm 3. Contract of propagate (G, W, obs, u, v) 


— propagate takes a vertex and propagates backwards a label over its prede- 
cessors. It returns a set of candidate V’-weakly-deciding nodes. 

— main calls propagate on a node of the closure not yet processed, gets can- 
didate V’-weakly deciding nodes, calls confirm to keep only true V’-weakly 
deciding nodes, adds them to the closure and updates their labels, and loops 
until no more V’-weakly deciding nodes are found. 


Function Confirm. A call to confirm(G, obs, u,v) takes four arguments: a graph 
G, a labeling of graph vertices obs, and two vertices u and v. It returns true if 
and only if at least one child u’ of u in G has a label in obs different from v, 
which can be written u’ € obs A obs[u’] # v. This simple function is left abstract 
here for lack of space. The Why3 formalization [17] contains a complete proof. 
Its contract is given as Algorithm 2. 
Function Propagate. A call to propagate(G, W, obs, u, v) takes five arguments: a 
graph G, a subset W of nodes of G, a labeling of nodes obs, and two vertices u 
and v. It traverses G backwards from u (stopping at nodes in W) and updates 
obs so that all predecessors not hidden by vertices in W have label v at the end 
of the function. It returns a set of potential V’-weakly deciding vertices. Again, 
this function is left abstract here but is proved in the Why3 development [17]. 
Its contract is given as Algorithm 3. 

propagate requires that, when called, only u is labeled with v (P,) and 
that u € W (P2). It ensures that, after the call, all the predecessors of u not 
hidden by a vertex in W are labeled v (Q1), the labels of the other nodes are 
unchanged (Q2), C contains only predecessors of u but not u itself (Q3), and 
all the predecessors that had a label before the call (different from v due to P;) 
and that have at least two children are in C (Q4). 
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Input: G = (V, E), a directed graph 
V’ CV, the input subset 
Output: W CV, the main result 
obs : Map(V,V), the final labeling 

Variables: L C V, a worklist of nodes to be treated 

C C V, a set of candidate V’-weakly deciding vertices 

ACV, a set of new V’-weakly deciding vertices 
Ensures: W = V’UWD<a(V’) 
Ensures: Vu, v € V,obs[u] =v <> v E€ obsa(u, W) 


1 begin 
2 Wev: obs\y1 + idy, ; L + Vv’ // initialization 
3 while L 4 Ø do // main loop 
// invariant: I A I2 A I3 A I4 A Is A I6 
// variant: cardinal(LU V \ W) 
4 u + choose(L) ; L — L \ {u} 
5 C <— propagate (G, W, obs, u, u) // propagation 
6 Ag 
7 while C # Ø do // filtering 
8 v — choose(C) ; C — C \ {v} 
9 if confirm (G, obs, v, u) = true then A< AU {v} 
10 end 
11 W-WuUuA^;obsa ~ida; Li LUA // update 
12 end 
// assert: Aı A A2 ^ A3 A A4 
13 return (W, obs) 
14 end 
(I1) Yz € W, obs[z] = z (Ai) Yu, v € V,v € obsa (u, W) 
(I2) Yy, z € V, obs[y] = z => zE W = obs|u] =v 
(Is) Vy, z € V, obs[y] =z2z^AzE€ L (Az) WDe(W) = Ø 
=> y=2z (As) V’ CW CV’UWD<a(V’) 


(Ia) Vy, z € V, obsly] =z => y 2% 2 (Aa) W = V'UWDe(V") 


(Is) V’ CW CV’ UWDe(V’) 
(Ie) Vy, z, z’ E V, y ae ZI obs[z] = 2 
Az’ g L => obsly] = 2’ 
Algorithm 4. Function main with annotations 


it 


Function Main. The main function of our algorithm is given as Algorithm 4. 
It takes two arguments: a graph G and a subset of vertices V’. It returns V’ U 
WDc(V’) and a labeling associating to each node its observable vertex in this 
set if it exists. It maintains a worklist L of vertices that must be processed. L 
is initially set to V’, and their labels to themselves (line 2). If L is not empty, 
a node u is taken from it and propagate(G, W, obs, u, u) is called (lines 3-5). It 
returns a set of candidate V’-weakly deciding nodes (C) that are not added to 
W yet. They are first filtered using confirm (lines 6-10). The confirmed nodes 
(A) are then added to W and to L, and the label of each of them is updated to 
itself (line 11). The iterations stop when L is empty (cf. lines 3, 13). 
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Proof of the Optimized Algorithm. We opted for Why3 instead of Coq for 
this proof to take advantage of Why3’s automation. Indeed, most of the goals 
could be discharged in less than a minute using Alt-Ergo, CVC4, Z3 and E. 
Some of them still needed to be proved manually in Coq, resulting in 330 lines 
of Coq proof. The Why3 development [17] focuses on the proof of the algorithm, 
not on the concepts presented in Sects. 3 and 4. Most of the concepts are proved, 
one of them is assumed in Why3 but was proved in Coq previously. Due to lack 
of space, we detail here only the main invariants necessary to prove main (cf. 
Algorithm 4). The proofs of I, I2, I3, I4 are rather simple. while those of I5 and 
Ig are more complex. 

I, states that each node in W has itself as a label. It is true initially for all 
nodes in V’ and is preserved by the updates. 

I> states that all labels are in W. This is true initially since all labels are in 
V’. The preservation is verified, since all updates are realized using labels in W. 

Iz states that labels in L have not been already propagated. Given a node y 
in L, y is the only node whose label is y. It is true initially since every vertex in 
V’ has itself as a label. After an update, the new nodes obey the same rule, so 
Is is preserved. 

I, states that if label z is associated to a node y then there exists a path 
between y and z. Initially, there exist trivial paths from each node in V” to itself. 
When obs is updated, there exists a W-path, thus in particular a path. 

Is states that W remains between V’ and V'UWDg(V’) during the execution 
of the algorithm. The first part V’ C W is easy to prove, because it is true 
initially and W is growing. For the second part, we need to prove that after the 
filtering, A C WDc(V’). For that, we will prove that A C WDg(W) thanks 
to Lemma 3. Let v be a node in A. Since A C C, we know that v ¢ W and 
u € obse(v, W). Moreover, we have confirm(G,obs,v,u) = true, i.e. v has a 
child v’ such that v’ € obs, hence v’ can reach W by I4, and obs[v'] Æ u, hence 
u ¢ obsc(v’, W). We can apply Lemma 4 and deduce that v E€ WDg(W). 

Ig is the most complicated invariant. Ig states that if there is a path between 
two vertices y and z that does not intersect W, and z has a label already pro- 
cessed, then y and z have the same label. Let us give a sketch of the proof 


of preservation of Jg after an iteration of the main loop. Let us note obs’ the 
i ¢ F (WUA)-—disjoint 
map at the end of the iteration. Let y, z,z’ € V such that y ————————>- z, 


obs’ |z| = z’ and z’ ¢ (L \ {u}) UA. Let us show that obs’ |y] = z’. First, observe 
that neither y nor z can be in A, otherwise z’ would be in A, which would 
be contradictory. We examine four cases depending on whether the conditions 


zee y (Hı) and y ee a (H2) hold. 


— Hı \ He: Both z and y were given the label u during the last iteration, thus 
obs’ |z| = obs'[y] = u as expected. 

— Hı A (4H): This case is impossible, since y ee 

— (4H) A (4H): Both z and y have the same label as before the iteration. We 
can therefore conclude by Ie at the beginning of the iteration. 

— (4H) A Hə: This is the only complicated case. We show that it is contra- 
dictory. For that, we introduce v; as the last vertex on the (W U A)-disjoint 


Fast Computation of Arbitrary Control Dependencies 221 


path connecting y and z which is also the origin of a W-path to u, and v2 as 
its successor on this (W U A)-disjoint path. We can show that vı € A, which 
contradicts the fact that it lives on a (W U A)-disjoint path. 


We can now prove the assertions A1, A2, A3 and A; at the end of main. A, is 
a direct consequence of Te since at the end L = Ø. A, implies that each vertex u 
has at most one observable in W: obs[u] if u € obs. A W-weakly deciding vertex 
would have two observables, thus WDg(W) = Ø. As is a direct consequence of 
Is. A4 can be deduced from Az and Lemma 3 applied to A3. This proves that at 
the end W = V’ U WDg(V’). To prove the other post-condition, we must prove 
that if there are two nodes u,v such that obs[u] = v, then v € obse(u, W). By 
I4, there is a path from u to v. Let w be the first element in W on this path. 


W-—path W-— path . 
Then u —““4 w. By Aj, obs[u] = w. Thus, w = v and u ——“+ v. This 
proves the second post-condition. 
7 Experiments 
: ates 
We have implemented Danicic’s algo- di ; Dandee aeouthen | 


rithm (additionally improved by the + Our algorithm 
two optimizations proposed in Remark 
4) and ours in OCaml [17] using the 40 
OCamlgraph library [9], taking care to 
add a filtering step at the end of our 
algorithm to preserve only nodes reach- 20 
able from the initial subset. To be confi- 
dent in their correctness, we have tested 
both implementations on small exam- 
ples w.r.t. a certified but slow Coq- 0 2,000 4,000 6,000 
extracted implementation as an oracle. Iv] 
We have also carefully checked that the 
results returned by both implementa- 
tions were the same in all experiments. 

We have experimentally evaluated both implementations on thousands of 
random graphs with up to thousands of vertices, generated by OCamlgraph. For 
every number of vertices between 10 and 1000 (resp. 6500) that is a multiple of 
10, we generate 10 graphs with twice as many edges as vertices and randomly 
select three vertices to form the initial subset V’ and run both algorithms (resp. 
only our algorithm) on them. Although the initial subsets are small, the result- 
ing closures nearly always represent a significant part of the set of vertices of the 
graph. To avoid the trivial case, we have discarded the examples where the clo- 
sure is restricted to the initial subset itself (where execution time is insignificant), 
and computed the average time of the remaining tests. Results are presented in 
Fig. 4. Experiments have been performed on an Intel Core i7 4810MQ with 8 
cores at 2.80 GHz and 16 GB RAM. 

We observe that Danicic’s algorithm explodes for a few hundreds of vertices, 
while our algorithm remains efficient for graphs with thousands of nodes. 


a 


: 


time(s) 


Fig. 4. Danicic’s vs. our algorithm 


222 J.-C. Léchenet et al. 


8 Related Work and Conclusion 


Related Work. The last decades have seen various definitions of control depen- 
dence given for larger and larger classes of programs [6,12,13,21,22,27]. To 
consider programs with exceptions and potentially infinite loops, Ranganath 
et al. [23] and then Amtoft [2] introduced non-termination sensitive and non- 
termination insensitive control dependence on arbitrary program structures. 
Danicic et al. [11] further generalized control dependence to arbitrary directed 
graphs, by defining weak and strong control-closure, which subsume the previ- 
ous non-termination insensitive and sensitive control dependence relations. They 
also gave a control dependence semantics in terms of projections of paths in the 
graph, allowing to define new control dependence relations as long as they are 
compatible with it. This elegant framework was reused for slicing extended finite 
state machines [3] and probabilistic programs [4]. In both works, an algorithm 
computing weak control-closure, working differently from ours, was designed and 
integrated in a rather efficient slicing algorithm. 

While there exist efficient algorithms to compute the dominator tree in a 
graph [8,10,16,19], and even certified ones [15], and thus efficient algorithms 
computing control dependence when defined in terms of post-dominance, algo- 
rithms in the general case [2,11,23] are at least cubic. 

Mechanized verification of control dependence computation was done in for- 
malizations of program slicing. Wasserrab [26] formalized language-independent 
slicing in Isabelle/HOL, but did not provide an algorithm. Blazy et al. [7] and 
our previous work [18] formalized control dependence in Coq, respectively, for 
an intermediate language of the CompCert C compiler [20] and on a WHILE 
language with possible errors. 


Conclusion and Future Work. Danicic et al. claim that weak control-closure 
subsumes all other non-termination insensitive variants. It was thus a natural 
candidate for mechanized formalization. We used the Coq proof assistant to 
formalize it. A certified implementation of the algorithm can be extracted from 
the Coq development. During formalization in Coq of the algorithm and its 
proof, we have detected an inconsistency in a secondary proof, which highlights 
how useful proof assistants are to detect otherwise overlooked cases. To the best 
of our knowledge, the present work is the first mechanized formalization of weak 
control-closure and of an algorithm to compute it. In addition to formalizing 
Danicic’s algorithm in Coq, we have designed, formalized and proved a new 
one, that is experimentally shown to be faster than the original one. Short-term 
future work includes considering further optimizations. Long-term future work 
is to build a verified generic slicer. Indeed, generic control dependence is a first 
step towards it. Adding data dependence is the next step in this direction. 


Acknowledgements. We thank the anonymous reviewers for helpful suggestions. 
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Abstract. The validation of modeling tools of custom domain-specific 
languages (DSLs) frequently relies upon an automatically generated set 
of models as a test suite. While many software testing approaches recom- 
mend that this test suite should be diverse, model diversity has not been 
studied systematically for graph models. In the paper, we propose diver- 
sity metrics for models by exploiting neighborhood shapes as abstrac- 
tion. Furthermore, we propose an iterative model generation technique 
to synthesize a diverse set of models where each model is taken from a dif- 
ferent equivalence class as defined by neighborhood shapes. We evaluate 
our diversity metrics in the context of mutation testing for an indus- 
trial DSL and compare our model generation technique with the popular 
model generator Alloy. 


1 Introduction 


Motivation. Domain-Specific Language (DSL) based modeling tools are gaining 
an increasing role in the software development processes. Advanced DSL frame- 
works such as Xtext, or Sirius built on top of model management frameworks 
such as Eclipse Modeling Framework (EMF) [37] significantly improve produc- 
tivity of domain experts by automating the production of rich editor features. 

Modelling environments may provide validation for the system under design 
from an early stage of development with efficient tool support for checking well- 
formedness (WF) constraints and design rules over large model instances of the 
DSL using tools like Eclipse OCL [24] or graph queries [41]. Model generation 
techniques [16,19,35,39] are able to automatically provide a range of solution 
candidates for allocation problems [19], model refactoring or context generation 
[21]. Finally, models can be processed by query-based transformations or code 
generators to automatically synthesize source code or other artifacts. 

The design of complex DSLs tools is a challenging task. As the complexity of 
DSL tools increases, special attention is needed to validate the modeling tools 
themselves (e.g. for tool qualification purposes) to ensure that WF constraints 
© The Author(s) 2018 
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and the preconditions of model transformation and code generation functionality 
[4,32,35] are correctly implemented in the tool. 


Problem Statement. There are many approaches aiming to address the test- 
ing of DSL tools (or transformations) [1,6,42] which necessitate the automated 
synthesis of graph models to serve as test inputs. Many best practices of testing 
(such as equivalence partitioning [26], mutation testing [18]) recommends the 
synthesis of diverse graph models where any pairs of models are structurally 
different from each other to achieve high coverage or a diverse solution space. 
While software diversity is widely studied [5], existing diversity metrics for 
graph models are much less elaborate [43]. Model comparison techniques [38] 
frequently rely upon the existence of node identifiers, which can easily lead to 
many isomorphic models. Moreover, checking graph isomorphism is computa- 
tionally very costly. Therefore practical solutions tend to use approximate tech- 
niques to achieve certain diversity by random sampling [17], incremental gen- 
eration [19,35], or using symmetry breaking predicates [39]. Unlike equivalence 
partitions which capture diversity of inputs in a customizable way for testing 
traditional software, a similar diversity concept is still missing for graph models. 


Contribution. In this paper, we propose diversity metrics to characterize a 
single model and a set of models. For that purpose, we innovatively reuse neigh- 
borhood graph shapes [28], which provide a fine-grained typing for each object 
based on the structure (e.g. incoming and outgoing edges) of its neighborhood. 
Moreover, we propose an iterative model generation technique to automatically 
synthesize a diverse set of models for a DSL where each model is taken from a 
different equivalence class wrt. graph shapes as an equivalence relation. 

We evaluate our diversity metrics and model generator in the context of 
mutation-based testing [22] of WF constraints in an industrial DSL tool. We 
evaluate and compare the mutation score and our diversity metrics of test suites 
obtained by (1) an Alloy based model generator (using symmetry breaking pred- 
icates to ensure diversity), (2) an iterative graph solver based generator using 
neighborhood shapes, and (3) from real models created by humans. Our finding 
is that a diverse set of models derived along different neighborhood shapes has 
better mutation score. Furthermore, based on a test suite with 4850 models, we 
found that high correlation between mutation score and our diversity metrics, 
which indicates that our metrics may be good predictors in practice for testing. 


Added Value. Up to our best knowledge, our paper is one of the first studies 
on (software) model diversity. From a testing perspective, our diversity met- 
rics provide a stronger characterization of a test suite of models than traditional 
metamodel coverage which is used in many research papers. Furthermore, model 
generators using neighborhood graph shapes (that keep models only if they are 
surely non-isomorphic) provide increased diversity compared to symmetry break- 
ing predicates (which exclude models if they are surely isomorphic). 
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2 Preliminaries 


Core modeling concepts and testing challenges of DSL tools will be illustrated in 
the context of Yakindu Statecharts [46], which is an industrial DSL for developing 
reactive, event-driven systems, and supports validation and code generation. 


2.1 Metamodels and Instance Models 


Metamodels define the main concepts, relations and attributes of a domain to 
specify the basic graph structure of models. A simplified metamodel for Yakindu 
state machines is illustrated in Fig. 1 using the popular Eclipse Modeling Frame- 
work (EMF) [37] is used for domain modeling. A state machine consists of 
Regions, which in turn contain states (called Vertexes) and Transitions. 
An abstract state Vertex is further refined into RegularStates (like State 
or FinalState) and PseudoStates (like Entry, Exit or Choice). 


(0. 


..*] outgoingTransitions [0..1] source 


B Transition {0..*] vertices =| Region 
Lo 
[0..*] incomingTransitions [1..1] target [0.*] regions 


a | RegularState | gl CompositeElement 


H FinalState H State H Statechart 


Fig. 1. Metamodel extract from Yakindu state machines 


Formally [32,34], a metamodel defines a vocabulary of type and relation sym- 
bols X = {Cy,...,Cn,Ri,---,Rm} where a unary predicate symbol C; is defined 
for each EClass, and a binary predicate symbol R; is derived for each EReference. 
For space considerations, we omit the precise handling of attributes. 

An instance model can be represented as a logic structure M = (Obj m,Im)} 
where Obj y is the finite set of objects (the size of the model is |M| = | Obj ml), 
and Zm provides interpretation for all predicate symbols in X as follows: 


— the interpretation of a unary predicate symbol C; is defined in accordance with 
the types of the EMF model: Zm(C;) : Obj y — {1,0} An object o € Obj m 
is an instance of a class C; in a model M if Zy¢(C;)(0) = 1. 

— the interpretation of a binary predicate symbol R; is defined in accordance 
withe the links in the EMF model: Zm (Rj) : Obj yy x Obj yy — {1,0}. There 
is a reference R; between 01,02 € Obj y in model M if Zm(R;)(01,02) = 1. 


A metamodel also specifies extra structural constraints (type hierarchy, mul- 
tiplicities, etc.) that need to be satisfied in each valid instance model [32]. 
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Example 1. Figure 2 shows graph representations of three (partial) instance mod- 
els. For the sake of clarity, Regions and inverse relations incomingTransitions 
and outgoingTransitions are excluded from the diagram. In M, there are two 
States (sl and s2), which are connected to a loop via Transitions t2 and t3. 
The initial state is marked by a Transition t1 from an entry el to state s1. Mə 
describes a similar statechart with three states in loop (s3, s4 and s5 connected 
via t5, t6 and t7). Finally, in M3 there are two main differences: there is an incom- 
ing Transition t1l to an Entry state (e3), and there is a State s7 that does not 
have outgoing transition. While all these M1 and M2 are non-isomorphic, later 
we illustrate why they are not diverse. 


M1 M2 M3 
el e2 = e3 
EE Entry Eal Entry ae Entry 
Pee >| eet — target >| 
[Transition [500° —>Reguiarstate! | [Transition [°° fRegularstate Eaa Eet > Rogutarstate 
[Vertex lVertex lVertex 
target target source 
y ý = o sia 
CEER State E treme State s6 
<—target j— = <target T| source Stat ii 
|RegularState| | Transition _ jularState| A jularState| —| 2) = = 
eg | Transition | p Rea target |Regularstate| < **°8°t — | Transition 
source lvertex 
x source Z target source s7 
7 CEE |State (EE E] State 
[State target—>| /<—source}————_ = | 
—| target —>| Girne IRegularStatel E | = target >(RegularState| 
7 peren Transition [eguia | Transition | Transition Iene 


Fig. 2. Example instance models (as directed graphs) 


[c(v)]Z = Zm (C)(Z(v)) [ei A plz = lly A [plz 
[R(vi, va) = Im (B)(Z(v1), Z(v2)) [1 V val 7 = kely V [vel 7 
[vi = ve] 7 := Z(v1) = Z(v2) Pelz = ~iez 
[Vo : elz = Azcoviy [AZ on [Av : elz = V zco ny loa 


Fig. 3. Inductive semantics of graph predicates 


2.2 Well-Formedness Constraints as Logic Formulae 


In many industrial modeling tools, WF constraints are captured either by OCL 
constraints [24] or graph patterns (GP) [41] where the latter captures structural 
conditions over an instance model as paths in a graph. To have a unified and 
precise handling of evaluating WF constraints, we use a tool-independent logic 
representation (which was influenced by [29,32,34]) that covers the key features 
of concrete graph pattern languages and a first-order fragment of OCL. 


Syntax. A graph predicate is a first order logic predicate (v1, ...Un) over 
(object) variables which can be inductively constructed by using class and rela- 
tion predicates C(v) and R(v1,v2), equality check =, standard first order logic 
connectives =~, V, A, and quantifiers 3 and V. 
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Semantics. A graph predicate y(v1,..., Un) can be evaluated on model M along 
a variable binding Z : {v1,...,Un}— Obj, from variables to objects in M. The 
truth value of y can be evaluated over model M along the mapping Z (denoted 
by [y(v1,---; Un) ) in accordance with the semantic rules defined in Fig. 3. 

If there is a variable binding Z where the predicate vy is evaluated to 1 over 
M is often called a pattern match, formally [ely = 1. Otherwise, if there are 
no bindings Z to satisfy a predicate, i.e. Iles = 0 for all Z, then the predicate 
y is evaluated to 0 over M. Graph query engines like [41] can retrieve (one or 
all) matches of a graph predicate over a model. When using graph patterns for 
validating WF constraints, a match of a pattern usually denotes a violation, thus 
the corresponding graph formula needs to capture the erroneous case. 


2.3 Motivation: Testing of DSL Tools 


A code generator would normally assume that the input models are well-formed, 
i.e. all WF constraints are validated prior to calling the code generator. How- 
ever, there is no guarantee that the WF constraints actually checked by the 
DSL tool are exactly the same as the ones required by the code generator. 
For instance, if the validation forgets to check a subclause of a WF constraint, 
then runtime errors may occur during code generation. Moreover, the precon- 
dition of the transformation rule may also contain errors. For that purpose, 
WF constraints and model transformations of DSL tools can be systematically 
tested. Alternatively, model validation can be interpreted as a special case of 
model transformation, where precondition of the transformation rules are fault 
patterns, and the actions place error markers on the model [41]. 

A popular approach for testing DSL tools is mutation testing [22,36] which 
aims to reveal missing or extra predicates by (1) deriving a set of mutants (e.g. 
WF constraints in our case) by applying a set of mutation operators. Then (2) 
the test suite is executed for both the original and the mutant programs, and (3) 
their output are compared. (4) A mutant is killed by a test if different output is 
produced for the two cases (i.e. different match set). (5) The mutation score of a 
test suite is calculated as the ratio of mutants killed by some tests wrt. the total 
number of mutants. A test suite with better mutation score is preferred [18]. 


Fault Model and Detection. As a fault model, we consider omission faults 
in WF constraints of DSL tools where some subconstraints are not actually 
checked. In our fault model, a WF constraint is given in a conjunctive normal 
form pe = 1^- -^9k, all unbound variables are quantified existentially (4), and 
may refer to other predicates specified in the same form. Note that this format 
is equivalent to first order logic, and does not reduce the range of supported 
graph predicates. We assume that in a faulty predicate (a mutant) the developer 
may forget to check one of the predicates y; (Constraint Omission, CO), i.e. 
Pe = [Pi A..-AQiA...AY,] is rewritten to pf = [p1 ^ Api- Apip A Apk] 
or may forgot a negation (Negation Omission), i.e. pe = [piA..-A(ay;i)A..-AYe] 
is rewritten to yr = [yi A... Api ^... A^ yx]. Given an instance model M, we 


assume that both [ve] and the faulty [y Ai can be evaluated separately by 
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the DSL tool. Now a test model M detects a fault if there is a variable binding 
Z, where the two evaluations differ, i.e. LAH $ losl 


Example 2. Two WF constraints checked by the Yakindu environment can be 
captured by graph predicates as follows: 


— ọ : incomingToEntry(E) := JT : Entry(E) A target(T, E) 
— ¢ : noOutgoingFromEntry( E) := Entry(E) A a(AT : source(T, E)) 


According to our fault model, we can derive two mutants for incoming ToEntry 
as predicates yf, := Entry( E) and yy, := dt: target(T, E). 

Constraints y and ¢ are satisfied in model Mı and Mə as the corresponding 
graph predicates have no matches, thus [eh = 0 and A = 0. As a test 
model, both Mı and Mg is able to detect the same omission fault both for yp 
as [vn = 1 (with E+ el and E+ e2) and similarly Yf, (with s1 and s3). 
However, M3 is unable to kill mutant yy, as (p had a match E + e3 which 
remains in Yp), but able to detect others. 


3 Model Diversity Metrics for Testing DSL Tools 


As a general best practice in testing, a good test suite should be diverse, but 
the interpretation of diversity may differ. For example, equivalence partitioning 
[26] partitions the input space of a program into equivalence classes based on 
observable output, and then select the different test cases of a test suite from 
different execution classes to achieve a diverse test suite. However, while software 
diversity has been studied extensively [5], model diversity is much less covered. 
In existing approaches [6,7,9,10,31,42] for testing DSL and transformation 
tools, a test suite should provide full metamodel coverage [45], and it should also 
guarantee that any pairs of models in the test suite are non-isomorphic [17,39]. 
In [43], the diversity of a model M; is defined as the number of (direct) types 
used from its MM, i.e. M; is more diverse than M; if more types of MM are used 
in M; than in M;. Furthermore, a model generator Gen deriving a set of models 
{M;} is diverse if there is a designated distance between each pairs of models 
M; and Mj: dist(M;,M,;) > D, but no concrete distance function is proposed. 
Below, we propose diversity metrics for a single model, for pairs of models 
and for a set of models based on neighborhood shapes [28], a formal concept 
known from the state space exploration of graph transformation systems [27]. 
Our diversity metrics generalize both metamodel coverage and (graph) isomor- 
phism tests, which are derived as two extremes of the proposed metric, and thus 
it defines a finer grained equivalence partitioning technique for graph models. 


3.1 Neighborhood Shapes of Graphs 


A neighborhood Nbh; describes the local properties of an object in a graph model 
for a range of size i € N [28]. The neighbourhood of an object o describes all 
unary (class) and binary (reference) relations of the objects within the given 
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range. Informally, neighbourhoods can be interpreted as richer types, where 
the original classes are split into multiple subclasses based on the difference 
in the incoming and outgoing references. Formally, neighborhood descriptors are 
defined recursively with the set of class and reference symbols X: 


— For range i = 0, Nbho is a subset of class symbols: Nbho C yh Cn} 

— A neighbor Ref, for i > 0 is defined by a reference symbol and a neighbor- 
hood: Ref, (e {Ri, ese Rm} x Nbhj_1. 

— For a range i > 0 neighborhood Nbh, is defined by a previous neighborhood 
and two sets of neighbor descriptors (for incoming and outgoing references 
separately): Noh; C Nbhy_, x 2%: x QRefi, 


Shaping function nbh; : Obj y —> Nbh; maps each object in a model M to a 
neighborhood with range i: (1) if i = 0, then nbho(o) = {c|[C(o)]” = 1}; (2) if 
i > 0, then nbh;(o) = (nbhy_1(0), in, out), where 

in = {(R,n)|do! € Obj yy : [R(o',0)]” An = nbhi-1(0)} 
out = {(R, n) |o" € Obj m : RCo, 0)” An = nbhi_1(o')} 


A (graph) shape of a model M for range i (denoted as $;(M)) is a set of 
neighborhood descriptors of the model: $;(14) = {x|3o E€ Obj y : nbh;(o) = z}. 
A shape can be interpreted and illustrated as a as a type graph: after calculating 
the neighborhood for each object, each neighborhood is represented as a node in 
the graph shape. Moreover, if there exist at least one link between objects in two 
different neighborhoods, the corresponding nodes in the shape will be connected 
by an edge. We will use the size of a shape |S:(M)| which is the number of shapes 
used in M. 


Example 3. We illustrate the concept of graph shapes for model M,. For range 
0, objects are mapped to class names as neighborhood descriptors: 


— nbho(e) = {Entry, PseudoState, Vertex} 
— nbho(t1) = nbho(t2) = nbho(t3) = {Transition} 
— nbho(sl) = nbho(s2) = {State, RegularState, Vertex} 


For range 1, objects with different incoming or outgoing types are further split, 
e.g. the neighborhood of t1 is different from that of t2 and t3 as it is connected 
to an Entry along a source reference, while the source of t2 and t3 are States. 


— nbhi(tl) = ({Transition},@, {(source, {Entry, PseudoState, Vertex}), 
(target, {State, RegularState, Vertex}) 

— nbhy(t2) = ({Transition},0, {(source, {State, RegularState, Vertex}), 
(target, {State, RegularState, Vertex}) = nbh,(t3) 


For range 2, each object of Mı would be mapped to a unique element. In 
Fig. 4, the neighborhood shapes of models M1, M2, and M3 for range 1, are repre- 
sented in a visual notation adapted from [28,29] (without additional annotations 
e.g. multiplicities or predicates used for verification purposes). The trace of the 
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Fig. 4. Sample neighborhood shapes of Mı, M2 and M3 


concrete graph nodes to neighbourhood is illustrated on the right. For instance, 
el and e2 in M1 and Mə Entries are both mapped to the same neighbourhood 
nl, while e3 can be distinguished from them as it has incoming reference from 
a transition, thus creating a different neighbourhood n5. 


Properties of Graph Shapes. The theoretical foundations of graph shapes 
[28,29] prove several key semantic properties which are exploited in this paper: 


P1 There are only a finite number of graph shapes in a certain range, and a 
smaller range reduces the number of graph shapes, i.e. |S:(M)| < |Si+ı(M)|. 
P2 |S;(Mj)| + |Si(Mx)| > [Si(Mj U Mx)| > |Si(Mj)| and |5;(My)]. 


3.2 Metrics for Model Diversity 


We define two metrics for model diversity based upon neighborhood shapes. 
Internal diversity captures the diversity of a single model, i.e. it can be evalu- 
ated individually for each and every generated model. As neighborhood shapes 
introduce extra subtypes for objects, this model diversity metric measures the 
number of neighborhood types used in the model with respect to the size of the 
model. External diversity captures the distance between pairs of models. Infor- 
mally, this diversity distance between two models will be proportional to the 
number of different neighborhoods covered in one model but not the other. 


Definition 1 (Internal model diversity). For a range i of neighborhood 
shapes for model M, the internal diversity of M is the number of shapes wrt. the 
size of the model: di"'(M) = |S;(M)|/|M|. 


The range of this internal diversity metric di”*(M) is [0..1], and a model M 
with di"*(M) = 1 (and |M| > |MM|) guarantees full metamodel coverage [45], 
i.e. it surely contains all elements from a metamodel as types. As such, it is 
an appropriate diversity metric for a model in the sense of [43]. Furthermore, 
given a specific range 7, the number of potential neighborhood shapes within 
that range is finite, but it grows superexponentially. Therefore, for a small range 
i, one can derive a model M; with d?"’(M,) = 1, but for larger models Mp (with 
|Mi| > |M;|) we will likely have di"*(M;) > di" (Mp). However, due to the rapid 
growth of the number of shapes for increasing range 7, for most practical cases, 
di"*(M;) will converge to 1 if M; is sufficiently diverse. 
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Definition 2 (External model diversity). Given a range i of neighborhood 
shapes, the external diversity of models M; and My is the number of shapes 
contained exclusively in M; or Mp but not in the other, formally, d&**(M;,M;) = 
|S;(M;) © S;(M;,)| where © denotes the symmetric difference of two sets. 


External model diversity allows to compare two models. One can show that 
this metric is a (pseudo)-distance in the mathematical sense [2], and thus, it can 
serve as a diversity metric for a model generator in accordance with [43]. 


Definition 3 (Pseudo-distance). A function d: M x M —> R is called a 
(pseudo-)distance, if it satisfies the following properties: 


- d is non-negative: d(M;, Mp) > 0 

- d is symmetric d(M;, Mp) = d( Mp, M;) 

- if Mj and Mp, are isomorphic, then d(M;, Mp) = 0 

- triangle inequality: d(M;, Mı) < d( Mk, M;) + d(M;, Mi) 


Corollary 1. External model diversity d§*'(Mj;,M,) is a (pseudo-)distance 
between models Mj and Mpg for any i. 


During model generation, we will exclude a model Mx if d§**(M;, Mp) = 0 for 
a previously defined model Mj, but it does not imply that they are isomorphic. 
Thus our definition allows to avoid graph isomorphism checks between M; and 
My, which have high computation complexity. Note that external diversity is 
a dual of symmetry breaking predicates [39] used in the Alloy Analyzer where 
d(M;, Mk) = 0 implies that Mj and M; are isomorphic (and not vice versa). 


Definition 4 (Coverage of model set). Given a range i of neighborhood 
shapes and a set of models MS = {Mj,,...,Mx}, the coverage of this model 
set is defined as cov;(MS) = |Si(M1) U - +- U Si(Mp)l. 


The coverage of a model set is not normalised, but its value monotonously 
grows for any range i by adding new models. Thus it corresponds to our expec- 
tation that adding a new test case to a test suite should increase its coverage. 


Example 4. Let us calculate the different diversity metrics for Mı, M2 and M3 
of Fig.2. For range 1, they have the shapes illustrated in Fig.4. The internal 
diversity of those models are d{"*(M,) = 4/6, d”t(M2) = 4/8 and di”(M3) = 
6/7, thus M3 is the most diverse model among them. As M; and Mz has the same 
shape, the distance between them is d{*'(M,, M2) = 0. The distance between 
Mı and M; is d¢*'(M,, M3) = 4 as Mı has 1 different neighbourhoods (n1), and 
M3 has 3 (n5, n6 and n7). The set coverage of Mı, Mz and M3 is 7 altogether, 
as they have 7 different neighbourhoods (n1 to n7). 


4 Iterative Generation of Diverse Models 


Now we aim at generating a diverse set of models MS = { M1, Mo,..., Mx} fora 
given metamodel MM (and potentially, a set of constraints WF). Our approach 
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(see Fig. 5) intentionally reuses several components as building blocks obtained 
from existing research results aiming to derive consistent graph models. First, 
model generation is an iterative process where previous solutions serve as further 
constraints [35]. Second, it repeatedly calls a back-end graph solver [33,44] to 
automatically derive consistent instance models which satisfy W F. 


Constraints Metamodel 
(wF) M) 


(Consistent) Same shape Need/has 
ce model? exists within R? more models? 


Sı Create new Calculate No 
partial partial neighborhood ia resultset 
model model shape y 

D ES E BD es a» 


Fig. 5. Generation of diverse models 


As a key conceptual novelty, we enforce the structural diversity of models 
during the generation process using neighborhood shapes at different stages. 
Most importantly, if the shape S:(Mn) of a new instance model M,, obtained 
as a candidate solution is identical to the shape S;(V;) for a previously derived 
model M; for a predefined (input) neighborhood range i, the solution candidate 
is discarded, and iterative generation continues towards a new candidate. 

Internally, our tool operates over partial models [30,34] where instance mod- 
els are derived along a refinement calculus [43]. The shapes of intermediate (par- 
tial) models found during model generation are continuously being computed. 
As such, they may help guide the search process of model generation by giving 
preference to refine (partial) model candidates that likely result in a different 
graph shape. Furthermore, this extra bookkeeping also pays off once a model 
candidate is found since comparing two neighborhood shapes is fast (conceptu- 
ally similar to lexicographical ordering). However, our concepts could be adapted 
to postprocess the output of other (black-box) model generator tools. 


Example 5. As an illustration of the iterative generation of diverse models, let us 
imagine that model M; (in Fig. 2) is retrieved first by a model generator. Shape 
S2( M1) is then calculated (see Fig. 4), and since there are no other models with 
the same shape, Mı is stored as a solution. If the model generator retrieves 
Mp2 as the next solution candidate, it turns out that S2(M2) = S2(M1), thus 
Mp is excluded. Next, if model M3 is generated, it will be stored as a solution 
since S$2(M3) 4 S2( M2). Note that we intentionally omitted the internal search 
procedure of the model generator to focus on the use of neighborhood shapes. 


Finally, it is worth highlighting that graph shapes are conceptually different 
from other approaches aiming to achieve diversity. Approaches relying upon 
object identifiers (like [38]) may classify two graphs which are isomorphic to 
be different. Sampling-based approaches [17] attempt to derive non-isomorphic 
models on a statistical basis, but there is no formal guarantee that two models 
are non-isomorphic. The Alloy Analyzer [39] uses symmetry breaking predicates 
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as sufficient conditions of isomorphism (i.e. two models are surely isomorphic). 
Graph shapes provide a necessary condition for isomorphism i.e. if a two non- 
isomorphic models have identical shape, one of them is discarded. 


5 Evaluation 


In this section, we provide an empirical evaluation of our diversity metrics and 
model generation technique to address the following research questions: 


RQ1: How effective is our technique in creating diverse models for testing? 
RQ2: How effective is our technique in creating diverse test suites? 
RQs3: Is there correlation between diversity metrics and mutation score? 


Target Domain. In order to answer those questions, we executed model gen- 
eration campaigns on a DSL extracted from Yakindu Statecharts (as proposed 
in [35]). We used the partial metamodel describing the state hierarchy and tran- 
sitions of statecharts (illustrated in Fig.1, containing 12 classes and 6 refer- 
ences). Additionally, we formalized 10 WF constraints regulating the transitions 
as graph predicates, based on the built-in validation of Yakindu. 

For mutation testing, we used a constraint or negation omission operator (CO 
and NO) to inject an error to the original WF constraint in every possible way, 
which yielded 51 mutants from the original 10 constraints (but some mutants 
may never have matches). We checked both the original and mutated versions 
of the constraints for each instance model, and a model kills a mutant if there 
is a difference in the match set of the two constraints. The mutation score for a 
test suite (i.e. a set of models) is the total number of mutants killed that way. 


Compared Approaches. Our test input models were taken from three different 
sources. First, we generated models with our iterative approach using a graph 
solver (GS) with different neighborhoods for ranges r = 1 to r=3. 

Next, we generated models for the same DSL using Alloy [39], a well-known 
SAT-based relational model finder. For representing EMF metamodels we used 
traditional encoding techniques [8,32]. To enforce model diversity, Alloy was 
configured with three different setups for symmetry breaking predicates: s = 0, 
s = 10 and s = 20 (default value). For greater values the tool produced the same 
set of models. We used the latest 4.2 build for Alloy with the default Sat4j [20] 
as back-end solver. All other configuration options were set to default. 

Finally, we included 1250 manually created statechart models in our anal- 
ysis (marked by Human). The models were created by students as solutions 
for similar (but not identical) statechart modeling homework assignments [43] 
representing real models which were not prepared for testing purposes. 


Measurement Setup. To address RQ1-RQ3, we created a two-step measure- 
ment setup. In Step I. a set of instance models is generated with all GS and 
Alloy configurations. Each tool in each configuration generated a sequence of 
30 instance models produced by subsequent solver calls, and each sequence is 
repeated 20 times (so 1800 models are generated for both GS and Alloy). In 
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Fig. 6. Mutation Scores and Diversity properties of models sets 


case of Alloy, we prevented the deterministic run of the solver to enable statisti- 
cal analysis. The model generators was to create metamodel-compliant instances 
compliant with the structural constraints of Subsect. 2.1 but ignoring the WF 
constraints. The target model size is set to 30 objects as Alloy did not scale with 
increasing size (the scalability and the details of the back-end solver is reported 
in [33]). The size of Human models ranges from 50 to 200 objects. 

In Step II., we evaluate and the mutation score for all the models (and for 
the entire sequence) by comparing results for the mutant and original predicates 
and record which mutant was killed by a model. We also calculate our diversity 
metrics for a neighborhood range where no more equivalence classes are produced 
by shapes (which turned out to be r = 7 in our case study). We calculated the 
internal diversity of each model, the external diversity (distance) between pairs 
of models in each model sequence, and the coverage of each model sequence. 


RQ1: Measurement Results and Analysis. Figure 6a shows the distribution 
of the number of mutants killed by at least one model from a model sequence (left 
box plot), and the distribution of internal diversity (right box plot). For killing 
mutants, GS was the best performer (regardless of the r range): most models 
found 36-41 mutants out of 51. On the other hand, Alloy performance varied 
based on the value of symmetry: for s=0, most models found 9-15 mutants 
(with a large number of positive outliers that found several errors). For s = 10, 
the average is increased over 20, but the number of positive outliers simulta- 
neously dropped. Finally, in default settings (s = 20) Alloy generated similar 
models, and found only a low number of mutants. We also measured the effi- 
ciency of killing mutants by Human, which was between GS and Alloy. None 
of the instance models could find more than 41 mutants, which suggests that 
those mutants cannot be detected at all by metamodel-compliant instances. 

The right side of Fig.6a presents the internal diversity of models measured 
as shape nodes/graph nodes (for fixpoint range 7). The result are similar: the 
diversity was high with low variance in GS with slight differences between ranges. 
In case of Alloy, the diversity is similarly affected by the symmetry value: 
s=0 produced low average diversity, but a high number of positive outliers. 
With s= 10, the average diversity increased with decreasing number of positive 
outliers. And finally, with the default s = 20 value the average diversity was low. 
The internal diversity of Human models are between GS and Alloy. 
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Fig. 7. Mutation score and set coverage for model sequences 


Figure 6b illustrates the average distance between all model pairs generated 
in the same sequence (vertical axis) for range 7. The distribution of external 
diversity also shows similar characteristics as Fig. 6a: GS provided high diversity 
for all ranges (56 out of the maximum 60), while the diversity between models 
generated by Alloy varied based on the symmetry value. 

As a summary, our model generation technique consistently outperformed 
Alloy wrt. both the diversity metrics and mutation score for individual models. 


RQ2: Measurement Results and Analysis. Figure 7a shows the number 
of killed mutants (vertical axis) by an increasing set of models (with 1 to 30 
elements; horizontal axis) generated by GS or Alloy. The diagram shows the 
median of 20 generation runs to exclude the outliers. GS found a large amount of 
mutants in the first model, and the number of killed mutants (36-37) increased 
to 41 by the 17th model, which after no further mutants were found. Again, 
our measurement showed little difference between ranges r=1, 2 and 3. For 
Alloy, the result highly depends on the symmetry value: for s=0 it found a 
large amount of mutants, but the value saturated early. Next, for s=10, the 
first model found significantly less mutants, but the number increased rapidly in 
the for the first 5 models, but altogether, less mutants were killed than for s = 0. 
Finally, the default configuration (s = 20) found the least number of mutants. 
In Fig. 7b, the average coverage of the model sets is calculated (vertical axis) 
for increasing model sets (horizontal axis). The neighborhood shapes are cal- 
culated for r = 0 to 5, which after no significant difference is shown. Again, 
configurations of symmetry breaking predicates resulted in different characteris- 
tics for Alloy. However, the number of shape nodes investigated by the test set 
was significantly higher in case of GS (791 vs. 200 equivalence classes) regardless 
of the range, and it was monotonously increasing by adding new models. 
Altogether, both mutation score and equivalence class coverage of a model 
sequence was much better for our model generator approach compared to Alloy. 
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RQ3: Analysis of Results. Figure8 illustrates the correlation between muta- 
tion score (horizontal axis) and internal diversity (vertical axis) for all generated 
and human models in all configurations. Considering all models (1800 Alloy, 
1800 GS, 1250 Human), mutation score and internal diversity shows a high 
correlation of 0.95 — while the correlation was low (0.12) for only Human. 
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Fig. 8. Model diversity and mutation score correlation 


Our initial investigation suggests that a high internal diversity will provide 
good mutation score, thus our metrics can potentially be good predictors in a 
testing context, but we cannot generalize to full statistical correlation. 


Threats to Validity and Limitations. We evaluated more than 4850 test 
inputs in our measurement, but all models were taken from a single domain 
of Yakindu statecharts with a dedicated set of WF constraints. However, our 
model generation approach did not use any special property of the metamodel 
or the WF constraints, thus we believe that similar results would be obtained for 
other domains. For mutation operations, we checked only omission of predicates, 
as extra constraints could easily yield infeasible predicates due to inconsistency 
with the metamodel, thus further reducing the number of mutants that can be 
killed. Finally, although we detected a strong correlation between diversity and 
mutation score with our test cases, this result cannot be generalized to statistical 
causality, because the generated models were not random samples taken from 
the universe of models. Thus additional investigations are needed to justify this 
correlation, and we only state that if a model is generated by either GS or Alloy, 
a higher diversity means a higher mutation score with high probability. 


6 Related Work 


Diverse model generation plays a key role in testing model transformations 
code generators and complete developement environments [25]. Mutation-based 
approaches [1,11,22] take existing models and make random changes on them 
by applying mutation rules. A similar random model generator is used for exper- 
imentation purposes in [3]. Other automated techniques [7,12] generate models 
that only conform to the metamodel. While these techniques scale well for larger 
models, there is no guarantee whether the mutated models are well-formed. 
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There is a wide set of model generation techniques which provide certain 
promises for test effectiveness. White-box approaches [1,6, 14,15,31,32] rely on 
the implementation of the transformation and dominantly use back-end logic 
solvers, which lack scalability when deriving graph models. 

Scalability and diversity of solver-based techniques can be improved by iter- 
atively calling the underlying solver [19,35]. In each step a partial model is 
extended with additional elements as a result of a solver call. Higher diversity is 
achieved by avoiding the same partial solutions. As a downside, generation steps 
need to be specified manually, and higher diversity can be achieved only if the 
models are decomposable into separate well-defined partitions. 

Black-box approaches [8,13,15,23] can only exploit the specification of the 
language or the transformation, so they frequently rely upon contracts or model 
fragments. As a common theme, these techniques may generate a set of simple 
models, and while certain diversity can be achieved by using symmetry-breaking 
predicates, they fail to scale for larger sizes. In fact, the effective diversity of 
models is also questionable since corresponding safety standards prescribe much 
stricter test coverage criteria for software certification and tool qualification than 
those currently offered by existing model transformation testing approaches. 

Based on the logic-based Formula solver, the approach of [17] applies stochas- 
tic random sampling of output to achieve a diverse set of generated models by 
taking exactly one element from each equivalence class defined by graph isomor- 
phism, which can be too restrictive for coverage purposes. Stochastic simulation 
is proposed for graph transformation systems in [40], where rule application is 
stochastic (and not the properties of models), but fulfillment of WF constraints 
can only be assured by a carefully constructed rule set. 


7 Conclusion and Future Work 


We proposed novel diversity metrics for models based on neighbourhood shapes 
[28], which are true generalizations of metamodel coverage and graph isomor- 
phism used in many research papers. Moreover, we presented a model generation 
technique that to derive structurally diverse models by (i) calculating the shape 
of the previous solutions, and (ii) feeding back to an existing generator to avoid 
similar instances thus ensuring high diversity between the models. The proposed 
generator is available as an open source tool [44]. 

We evaluated our approach in a mutation testing scenario for Yakindu Stat- 
echarts, an industrial DSL tool. We compared the effectiveness (mutation score) 
and the diversity metrics of different test suites derived by our approach and 
an Alloy-based model generator. Our approach consistently outperformed the 
Alloy-based generator for both a single model and the entire test suite. More- 
over, we found high (internal) diversity values normally result in high mutation 
score, thus highlighting the practical value of the proposed diversity metrics. 

Conceptually, our approach can be adapted to an Alloy-based model gener- 
ator by adding formulae obtained from previous shapes to the input specifica- 
tion. However, our initial investigations revealed that such an approach does not 


242 O. Semerath and D. Varró 


scale well with increasing model size. While Alloy has been used as a model gen- 
erator for numerous testing scenarios of DSL tools and model transformations 
[6,8,35,36,42], our measurements strongly indicate that it is not a justified choice 
as (1) Alloy is very sensitive to configurations of symmetry breaking predicates 
and (2) the diversity and mutation score of generated models is problematic. 
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Abstract. Spectrum based fault localisation determines how suspicious 
a line of code is with respect to being faulty as a function of a given test 
suite. Outstanding problems include identifying properties that the test 
suite should satisfy in order to improve fault localisation effectiveness 
subject to a given measure, and developing methods that generate these 
test suites efficiently. 

We address these problems as follows. First, when single bug optimal 
measures are being used with a single-fault program, we identify a formal 
property that the test suite should satisfy in order to optimise fault local- 
isation. Second, we introduce a new method which generates test data 
that satisfies this property. Finally, we empirically demonstrate the util- 
ity of our implementation at fault localisation on Sv-COMP benchmarks 
and the tcas program, demonstrating that test suites can be generated 
in almost a second with a fault identified after inspecting under 1% of 
the program. 


Keywords: Software quality - Spectrum based fault localisation 
Debugging 


1 Introduction 


Faulty software is estimated to cost 60 billion dollars to the US economy per 
year [1] and has been single-handedly responsible for major newsworthy catas- 
trophes!. This problem is exacerbated by the fact that debugging (defined as 
the process of finding and rectifying a fault) is complex and time consuming — 
estimated to consume 50-60% of the time a programmer spends in the main- 
tenance and development cycle [2]. Consequently, the development of effective 
and efficient methods for software fault localisation has the potential to greatly 
reduce costs, wasted programmer time and the possibility of catastrophe. 

In this paper, we advance the state of the art in lightweight fault localisation 
by building on research in spectrum-based fault localisation (SBFL). SBFL is one 
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of the most prominent areas of software fault localisation research, estimated to 
make up 35% of published work in the field to date [3], and has been demon- 
strated to be efficient and effective at finding faults [4-12]. The effectiveness relies 
on two factors, (1) the quality of the measure used to identify the lines of code 
that are suspected to be faulty, and (2) the quality of the test suite used. Most 
research in the field has been focussed on finding improved measures [4-12], but 
there is a growing literature on how to improve the quality of test suites [13-20]. 
An outstanding problem in this field is to identify the properties that test suites 
should satisfy to improve fault localisation. 

To address this problem, we focus our attention on improving the quality of 
test suites for the purposes of fault localisation on single-fault programs. Pro- 
grams with a single fault are of special interest, as a recent study demonstrates 
that 82% of faulty programs could be repaired with a “single fix” [21], and that 
“when software is being developed, bugs arise one-at-a-time and therefore can be 
considered as single-faulted scenarios”, suggesting that methods optimised for 
use with single-fault programs would be most helpful in practice. Accordingly, 
the contributions of this paper are as follows. 


1. We identify a formal property that a test suite must satisfy in order to be 
optimal for fault localisation on a single-fault program when a single-fault 
optimal SBFL measure is being used. 

2. We provide a novel algorithm which generates data that is formally shown to 
satisfy this property. 

3. We integrate this algorithm into an implementation which leverages model 
checkers to generate small test suites, and empirically demonstrate its prac- 
tical utility at fault localisation on our benchmarks. 


The rest of this paper is organized as follows. In Sect. 2, we present the formal 
preliminaries for SBFL and our approach. In Sect.3, we motivate and describe 
a property of single-fault optimality. In Sect.4, we present an algorithm which 
generates data for a given faulty program, and prove that the data generated sat- 
isfies the property of single fault optimality, and in Sect. 5 discuss implementation 
details. In Sect. 6 we present our experimental results where we demonstrate the 
utility of an implementation of our algorithm on our benchmarks, and in Sect. 7 
we present related work. 


2 Preliminaries 


In this section we formally present the preliminaries for understanding our fault 
localisation approach. In particular, we describe probands, proband models, and 
SBFL. 


2.1 Probands 


Following the terminology in Steimann et al. [22], a proband is a faulty program 
together with its test suite, and can be used for evaluating the performance of 


248 D. Landsberg et al. 


int main() { 


int input1, input2, input3; // C1 Cy | Co! C3} Ca! Cs E 
int least = input1; 
int most = input1; ty 1 0 1 1 0 1 
if (most < input2) t2 I 0 0 1 1 1 
most = input2; // C2 ts 1 0 0 1 0 1 
if (most < input3) t4 1 1 0 0 0 0 
most = input3; // C3 t 1 0 1 0 0 0 
if (least > input2) te 1 (0) 0 0 1 0 
most = input2; // C4 (bug) 
t7 1 0 0 1 1 0 
if (least > input3) ts 1 0 0 0 0 0 
least = input3; // C5 
tg 1 1 0 0 1 0 
assert(least <= most); // E 
} tio 1 1 1 0 0 0 
Fig. 1. minmax.c Fig. 2. Coverage matrix 


a given fault localization method. A faulty program is a program that fails to 
always satisfy a specification, which is a property expressible in some formal 
language and describes the intended behaviour of some part of the program 
under test (PUT). When a specification fails to be satisfied for a given execution 
(i.e., an error occurs), it is assumed there exists some (incorrectly written) lines 
of code in the program which was the cause of the error, identified as a fault 
(aka bug). 


Example 1. An example of a faulty C program is given in Fig. 1 (minmax.c, taken 
from Groce et al. [23]), and we shall use it as our running example throughout 
this paper. There are some executions of the program in which the assertion 
statement least <= most is violated, and thus the program fails to always sat- 
isfy the specification. The fault in this example is labelled C4, which should be 
an assignment to least instead of most. 


A test suite is a collection of test cases whose result is independent of the 
order of their execution, where a test case is an execution of some part of a 
program. Each test case is associated with an input vector, where the n-th value 
of the vector is assigned to the n-th input of the given program for the purposes 
of a test (according to some given method of assigning values in the vector to 
inputs in the program). Each test suite is associated with a set of input vectors 
which can be used to generate the test cases. A test case fails (or is failing) if it 
violates a given specification, and passes (or is passing) otherwise. 


Example 2. We give an example of a test case for the running example. The 
test case with associated input vector (0,1,2) is an execution in which input1 
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is assigned 0, input2 is assigned 1, and input3 is assigned 2, the statements 
labeled C1, C2 and C3 are executed, but C4 and C5 are not executed, and the 
assertion is not violated at termination, as least and most assume values of 0 
and 2 respectively. Accordingly, we may associate a collection of test cases (a test 
suite) with a set of input vectors. For the running example the following ten 
input vectors are associated with a test suite of ten test cases: (1,0,2), (2,0, 1), 
(2,0,2), (0,1,0), (0,0,1), (1,1,0), (2,0,0), (2,2,2), (1,2,0), and (0,1,2). Here, 
the first three input vectors result in error (and thus their associated test cases 
are failing), and the last seven do not (and thus their associated test cases are 
passing). 


A unit under test UUT is a concrete artifact in a program which is a can- 
didate for being at fault. Many types of UUTs have been defined and used in 
the literature, including methods [24], blocks [25,26], branches [16], and state- 
ments [27-29]. A UUT is said to be covered by a test case just in case that test 
case executes the UUT. For convenience, it will help to always think of UUTs as 
being labeled C1, C2, ... etc. in the program itself (as they are in the running 
example). Assertion statements are not considered to be UUTs, and we assume 
that each fault in the program has a corresponding UUT. 


Example 3. To illustrate some UUTs for the running example (Fig. 1), we have cho- 
sen the units under test to be the statements labeled in comments marked C1, ..., 
C5. The assertion is labeled E, which is violated when an error occurs. To illustrate 
a proband, the faulty program minmax.c (described in Example 1), and the test 
suite associated with the input vectors described in Example 2, together form a 
proband. 


2.2 Proband Models 


In this section we define proband models, which are the principle formal objects 
used in SBFL. Informally, a proband model is a mathematical abstraction of a 
proband. We assume the existence of a given proband in which the UUTs have 
already been identified for the faulty program and appropriately labeled C1, ..., 
Cn, and assume a total of n UUTs. We begin as follows. 


Definition 1. A set of coverage vectors, denoted by T, is a set {t1,..., tr} in 
which each t € T is a coverage vector defined ty = (ci,...,ck,1,k), where 


- for all0 <i <n, ck =1 if the i-th UUT is covered by the test case associated 
with tg, and O otherwise. 
= Ca = 1 if the test case associated with t, fails and 0 if it passes. 


We also call a set of coverage vectors T the fault localisation data or a dataset. 
Intuitively, each coverage vector can be thought of as a mathematical abstraction 
of an associated test case which describes which UUTs were executed/covered in 
that test case. We also use the following additional notation. If the last argument 
of a coverage vector in T is the number k it is denoted tg where k uniquely 
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identifies a coverage vector in T and the corresponding test case in the associated 
test suite. In general, for each t € T, c is a coverage variable and gives the 
value of the i-th argument in tp. If cë 41 = 1, then tx is called a failing coverage 
vector, and passing otherwise. The set of failing coverage vectors/the event of an 
error is denoted E (such that the set of passing vectors is then F). Element c*,; 
is also denoted e* (as it describes whether the error occurred). For convenience, 
we may represent the set of coverage vectors T with a coverage matrix, where for 
all 0 < i < n and tp € T the cell intersecting the i-th column and k-th row is c} 
and represents whether the i-th UUT was covered in the test case corresponding 
to tk. The cell intersecting the last column and k-th row is e* and represents 
whether tẹ is a failing or passing test case. Fig. 2 is an example coverage matrix. 
In practice, given a program and an input vector, one can extract coverage 
information from an associated test case using established tools?. 


Example 4. For the test suite given in Example 2 we can devise a set of cov- 
erage vectors T = {t1,...,¢1o} in which tı = (1,0,1,1,0,1,1), t2 = (1,0,0,1, 
1,1,2), t3 = (1,0,0,1,0,1,3), t4 = (1,1,0,0,0,0,4), ts = (1,1,0,0,0,0,5), te = 
(1,0,0,0,1,0,6), tz = (1,0,0,1,1,0,7), tg = (1,0,0,0,0,0,8), tọ = (1,1,0,0, 
1,0,9), and tio = (1,1,1,0,0,0,10). Here, coverage vector t is associated with 
the k-th input vector described in the list in Example 2. To illustrate how input 
and coverage vectors relate, we observe that tig is associated with a test case 
with input vector (0,1,2) which executes the statements labeled C1, C2 and C3, 
does not execute the statements labeled C4 and C5, and does not result in error. 
Consequently cł? = c$? = c}? = 1, and c}? = c}? =e” = 0, and k = 10 such 
that tio = (1,1,1,0, 0,0,10) (by the definition of coverage vectors). The coverage 
matrix representing T is given in Fig. 2. 


Definition 2. Let T be a non-empty set of coverage vectors, then T’s program 
model PM is defined as the sequence (Ci,...,C\pmyj), where for each C; € PM, 
C; = {tk E Tle = 1}. 


We often use the notation PMr to denote the program model PM associated 
with T. The final component C\pyyy is also denoted F (denoting the event of 
the error). Each member of a program model is called a program component or 
event, and if c¥ = 1 we say C; occurred in tg, that tg covers Ci, and say that 
C; is faulty just in case its corresponding UUT is faulty. Following the definition 
above, each component C; is the set of vectors in which C; is covered, and 
obey set theoretic relationships. For instance, for all components Ci, C; € PM, 
we have Vt, € Cj. c} = 1 just in case C; C Ci. In general, we assume that Æ 
contains at least one coverage vector and each coverage vector covers at least one 
component. Members of E and E are called failing/passing vectors, respectively. 


Example 5. We use the running example to illustrate a program model. For 
the set of coverage vectors T = {t1,..., t10}, we may define a program model 


2 For C programs Gcov can be used, available at http: //www.gcovr.com. 
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PM = (Ch, Co, C3, C4, Cs, Ey, where Cı = {t1, sae tio}, C2 = {t4, to, tio}, C3 = 
{t1, ts, tio}, C4 = {t1, t2, t3, t7}, Cs = {to, te, t7, to}, E = {t1, t2, t3}. Here, we 
may think of C1,...,Cs5 as events which occur just in case a corresponding UUT 
(lines of code labeled C1, ...,C5 respectively) is executed, and E as an event 
which occurs just in case the assertion least <= most is violated. C4 is identified 
as the faulty component. 


Definition 3. For a given proband we define a proband model (PM, T), con- 
sisting of the given faulty program’s program model PM, and an associated test 
suite’s set of coverage vectors T. 


Finally, we extend our setup to distinguish between samples and populations. 
The population test suite for a given program is a test suite consisting of all 
possible test cases for the program, a sample test suite is a test suite consisting 
of some (but not necessarily all) possible test cases for the program. All test 
suites are sample test suites drawn from a given population. Let (PM, T) be a 
given proband model for a given faulty program and sample test suite, we denote 
the population vectors, corresponding to the population test suite of the given 
faulty program, as T* (and E* and E” as the population failing and passing 
vectors in T* respectively). The population program model associated with the 
population test suite is denoted PM* (aka PM>.). (PM™*,T*) is called the 
population proband model. Finally, we extend the use of asterisks to make clear 
that the asterisked variable is associated with a given population. Accordingly, 
each component in the population program model is also superscripted with a * 
to denote that it is a member of PM* (e.g. Cj). Each vector in the population 
set of vectors T* (e.g., tï), and each coverage variable in each vector t% € T* 
(e.g, c*). 

It is assumed that for a given sample proband model (PM, T) and its pop- 
ulation proband model (PM*, T*), we have T C T*. Intuitively, this is because 
a sample test suite is drawn from the population. In addition, for each ¿į € N if 
Ci € PM and C* € PM", then C; C C}. Intuitively, this is because if the i-th 
UUT is executed by a test case in the sample then it is executed by that test case 
in the population. 


2.3 Spectrum Based Fault Localisation 


We first define what a program spectrum is, as it serves as the principle formal 
object used in spectrum based fault localization (SBFL). 


Definition 4. For each proband model (PM, T), and each component Ci E PM, 
a component’s program spectrum is a vector (|\C;NE|,|C;N E|, [Cin E|, (CiN E|). 


Informally, |C; O E| is the number of failing coverage vectors in T that 
cover C;, |C; N E| is the number of failing coverage vectors in T that do not 
cover C;, |C; N E| is the number of passing coverage vectors in T that cover Cj, 
and |C; N E| is the number of passing coverage vectors in T that do not cover 
Ci. |C: N E|, |C: N E|, |C: N E| and |C; N E| are often denoted aż p, ai, p, atp, and 
aip respectively in the literature [4,7-12]. 
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Example 6. For the proband model of the running example (PM, T) (where 
PM = (Ci,...,Cs,£) and T is represented by the coverage matrix in Fig. 2), 
the spectra for C4, ...Cs, and E are (3,0,7,0), (0,3,3,4), (1,2, 2,5), (3,0, 1,6), 
(1,2,3,4), and (3,0,0,7) respectively. 


Following Naish et al. [7], we define a suspiciousness measure as follows. 


Definition 5. A suspiciousness measure w is a function with signature w : 
PM — R, and maps each Ci E€ PM to a real number as a function of Ci’s 
program spectrum (|C; N E|, |Ci A E], |Ci N E|,|C; 0 El), where this number is 
called the component’s degree of suspiciousness. 


The higher/lower the degree of suspiciousness the more/less suspicious C; is 
assumed to be with respect to being a fault. A property of some SBFL measures 
is single-fault optimality [7,30]. Using our notation we can express this property 
as follows: 


Definition 6. A suspiciousness measure w is single-fault optimal if it satisfies 
the following. For every program model PM and every C; € PM: 


1. If E Z Ci and E C C}, then w(Cj) > w(C;i) and 
2. if E ie Ci, EC C;, IC; NE] = k and |C} N E| < k, then w(C;) > w(C). 


Under the assumption that there is a single fault in the program, Naish 
et al. argue that a measure must have this property to be optimal [7]. Informally, 
the first condition demands that UUTs covered by all failing test cases are more sus- 
picious than anything else. The rationale here is that if there is only one faulty UUT 
in the program, then it must be executed by all failing test cases (otherwise there 
would be some failing test case which executes no fault — which is impossible given 
it is assumed that all errors are caused by the execution of some faulty UUT) [7,30]. 
The second demands that of two UUTs covered by all failing test cases, the one which 
is executed by fewer passing test cases is more suspicious. 

An example of a single fault optimal measure is the Naish-I measure w(C;) = 


abs — aoa [31]. A framework that optimises any given SBFL measure to 
being single fault optimal was first given by Naish [31]. For any suspiciousness 
measure w scaled from 0 to 1, we can construct the single fault optimised version 
for w (written Opty) as follows (here, we use the equivalent formulation of 
Landsberg et al. [4]): Optw(Ci) = afp + 2 if al; = |E], and w(C;) otherwise. 
We now describe the established SBFL algorithm [4,7-12]. The method pro- 
duces a list of program component indices ordered by suspiciousness, as a func- 
tion of set of coverage vectors T (taken from a proband model (PM, T)) and 
suspiciousness measure w. As the algorithm is simple, we informally describe 
the algorithm in three stages, as follows. First, the program spectrum for each 
program component is constructed as a function of T. Second, the indices of 
program components are ordered in a suspiciousness list according to decreas- 
ing order of suspiciousness. Third, the suspiciousness list is returned to the user, 
who will inspect each UUT corresponding to each index in the suspiciousness 
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list in decreasing order of suspiciousness until a fault is found. We assume that 
in the case of ties of suspiciousness, the UUT that comes earlier in the code is 
investigated first, and assume effectiveness of a SBFL measure on a proband is 
measured by the number of non-faulty UUTs a user has to investigate before a 
fault is found. 


Example 7. We illustrate an instance of SBFL using our running minmax.c exam- 
ple of Fig. 1, and the Naish-I measure as an example suspiciousness measure. 
First, the program spectra (given in Example 6) are constructed as a function 
of the given coverage vectors (represented by the coverage matrix of Fig. 2). 
Second, the suspiciousness of each program component is computed (here, the 
suspiciousness of the five components are 2.125, —0.375, 0.75, 2.875, 0.625 respec- 
tively), and the indices of components are ordered according to decreasing order 
of suspiciousness. Thus we get the list (4,1,3,5,2). Finally, the list is returned 
to the user, and the UUTs in the program are inspected according to this list in 
descending order of suspiciousness until a fault is found. In our running example, 
C4 (the fault) is investigated first. 


3 A Property of Single-Fault Optimal Data 


In this section, we identify a new property for the optimality of a given dataset T 
for use in fault localisation. Throughout we make two assumptions: Firstly that 
a single bug optimal measure w is being used and secondly that there is a single 
bug in a given faulty program (henceforth our two assumptions). Let (PM, T) 
be a given sample proband model, then we have the following: 


Definition 7. A PROPERTY OF SINGLE FAULT OPTIMAL DATA. If T is single 
bug optimal, then VC; E€ PMr. E C Ci — E* CCF. 


If this condition holds, then we say the dataset T (and its associated test 
suite) satisfies this property of single fault optimality. Informally, the condition 
demands that if a UUT is covered by all failing test cases in the sample test suite 
then it is covered by all failing test cases in the population. If our two assumptions 
hold, we argue it is a desirable that a test suite satisfies this property. This 
is because the fault is assumed to be covered by all failing test cases in the 
population (similar to the rationale of Naish et al. [7]), and as UUTs executed 
by all failing test cases in the sample are investigated first when a single fault 
optimal measure is being used, it is desirable that UUTs not covered by all failing 
test cases in the population are less suspicious in order to guarantee the fault 
is found earlier. An additional desirable feature of knowing one’s data satisfies 
this property, is that we do not have to add any more failing test cases to a test 
suite, given it is then impossible to improve fault localization effectiveness by 
adding more failing test cases under our two assumptions. 
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Algorithm 1. Single-fault optimal data generation algorithm 
Data: E, E* (pre-condition: E C E* A E #0) 
repeat 
T — choose({t% € E*|3i € Nt; € E. = 1A ck* = 0}); 
BE EUT; 
until T = Q; 
return E 


ak WN 


4 Algorithm 


In this section we present an algorithm which outputs single fault optimal data 
for a given faulty program. We assume several preconditions for our algorithm. 


— For the given faulty program, at least one UUT is executed by all failing 
test cases (for C programs this could be a variable initialization in the main 
function). 

— The population proband model is available (but as we shall see in the next 
section, practical implementations will not require this). 

— We also assume that E is a mutable set, and shall make use of a choose(X) 
subroutine which non-deterministically returns the set of a single a member 
of X (if one exists, otherwise it returns the empty set). 


The algorithm is formally presented as Algorithm 1. We assume that an 
associated sample test suite will also be available as a by-product of the algorithm 
in addition to producing the data Æ. The intuition behind the algorithm is that 
failing vectors are iteratively accumulated in a set Æ one by one, where the 
next failing vector added does not cover some component which is covered by 
all vectors already in E (the algorithm terminates if no such vector exists). The 
resulting set is observed to be single-fault optimal. To illustrate the algorithm 
we give the example below. We then give a proof of partial correctness. 


Example 8. We assume some population set of failing coverage vectors E*, which 
we may identify with the set {t7,¢5,t3} = {(1,0,1,1,0,1,1), (1,0,0,1,1,1, 2), 
(1,0,0,1,0,1,3)} described in the coverage matrix of Fig. 2. In reality, the pop- 
ulation set of failing coverage vectors for this faulty program is much larger 
than this, but this will suffice for our example. The algorithm proceeds as fol- 
lows. First, we assume F is a non-empty subset of E*, and thus may assume 
E = {(1,0,1,1,0,1,1)}. Now, to evaluate step 2, we first evaluate the set 
{t € E*|di € Nt; € E. = 1A ck = 0}. Intuitively, this is the set of 
failing vectors in the population which do not cover some component which is 
covered by all vectors in E. We may find a member of this set as follows. First, 
we must evaluate the condition for when E* = {t7, th, t3}. Given c} = 1 holds of 
tı, and tı is the only member of F, and given c3* = 0, we have the conclusion 
that t3 is a member of the set. Thus, for our example we may assume that choose 
returns t3 from this set such that T = {t3}. So at step 3 the new version of E is 
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E = {(1,0,1,1,0,1,1), (1,0,0,1,1,1,2)}. Consequently, on the next iteration of 
the loop the set condition will be unsatisfiable — this is because there is no index 
to a component i such that both Vt; € E.c} = 1 holds (i.e, E C Cj), and also 
ck* = 0 holds for some vector tj in the population (i.e., not E* C C;). Thus, 
choose will return the empty set, and the algorithm will terminate returning 
the dataset Æ to the user to be used in SBFL. Using the Naish-I measure with 
this dataset, we have the result that C1 and C4 are associated with the largest 
suspicious score of 2.0. Thus, with single-fault optimal data alone we can find a 
fault C4 reasonably effectively in our running example. 


Proposition 1. All datasets returned by Algorithm 1 are single-fault optimal. 


Proof. We show partial correctness as follows. Let (PM™*,T*) be a given pop- 
ulation proband model, where E* C T* is the population set of failing vectors, 
and let E be returned by the algorithm. We must show that for all C; € PMz, 
E C C; — E* C CF (by def. of single fault optimality). We prove this by con- 
tradiction. Assume there is some C; E€ PMp (without loss of generality we may 
assume ¿į = 1), such that E C Cı but not E* C Cy. Given we assume E has 
been returned by the algorithm, we may assume T = Ø (step 4), and thus choose 
returned Í at step 2 (by def. of choose). Accordingly, there is no t} € E* where 
((Vtk € E) = 1) A ck* = 0 (by the set condition at step 2). Thus, (Vt € E*) 
(Vt; € E)ci = 1) > ch* = 1. Now, ((Vt; € E) ci = 1) just in case E C C 
(by def. of program models). So, (Vtj € E*), if Æ C Cy then cf* = 1 (by sub- 
stitution of equivalents). Equivalently, if Æ C C, then (Vt € E*) c}* = 1. Now, 
in general it holds that ((Vtz; € E*) cf* = 1) just in case E* C Cf (by def. of 
program models). Thus Æ C Cı — E* C Cf (by substitution of equivalents). 
This contradicts the initial assumption. 


Finally, we informally observe that the maximum size of the E returned is the 
number of uuTs. In this case E is input to the algorithm with a failing vector that 
covers all components, and choose always returns a failing vector that covers 1 
fewer UUTs than the failing vector covering the fewest UUTs already in E (noting 
that we assume at least one component will always be covered). The minimum 
is one. In this case E is input to the algorithm with a failing vector which covers 
some components and the post-condition is already fulfilled. In general, E can 
potentially be much smaller than E*. 


5 Implementation 


We now discuss our implementation of the algorithm. In practice, we can leverage 
model checkers to compute members of E* (the population set of failing vectors) 
on the fly, where computing E* as a pre-condition would usually be intractable. 
This can be done by appeal to a SMT solving subroutine, which we describe as 
follows. Given a formal model of some code Feoae, a formal specification ¢, set of 
Booleans which are true just in case a corresponding UUT is executed in a given 
execution {C1,...,Cn}, and a set E C E*, we can use a SMT solver to return a 
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satisfying assignment by calling SMT (Feode A nG A V (vtpew)ch=1 Ci = 0), and 
then extracting a coverage vector from that assignment. A subroutine which 
returns this coverage vector (or the empty set if one does not exist) can act 
as a substitute for the choose subroutine in Algorithm 1, and the generation 
of a static object E* is no longer required as an input to the algorithm. Our 
implementation of this is called sfo (single fault optimal data generation tool). 

We now discuss extensions of sfo. It is known that adding passing executions 
help in SBFL [4,5, 7-12], thus to develop a more effective fault localisation procedure 
we developed a second implementation sfo, (sfo with passing traces) that runs sfo 
and then adds passing test cases. To do this, after running sfo we call a SMT solver 
20 times to find up to 20 new passing execution, where on each call if the vector 
found has new coverage properties (does not cover all the same UUTs as some pass- 
ing vector already computed) it is added to a set of passing vectors. 

Our implementations of sfo and sfo, are integrated into a branch of the 
model checker CBMC [32]. Our branch of the tool is available for download at the 
URL given in the footnote®. Our implementations, along with generating fault 
localisation data, rank UUTs by degree of suspiciousness according to the Naish-I 
measure and report this fault localisation data to the user. 


6 Experimentation 


In this section we provide details of evaluation results for the use of sfo and sfo, 
in fault localisation. The purpose of the experiment is to demonstrate that imple- 
mentations of Algorithm 1 can be used to facilitate efficient and effective fault 
localisation in practice on small programs (<2.5KLOC). We think generation 
of fault localisation information in a few seconds (<2) is sufficient to demon- 
strate practical efficiency, and ranking the fault in the top handful of the most 
suspicious lines of code (<5) on average is sufficient to demonstrate practical 
effectiveness. In the remainder of this section we present our experimental setup 
(where we describe our scoring system and benchmarks), and our results. 


6.1 Setup 


For the purposes of comparison, we tested the fault localisation potential of sfo 
and sfo, against a method named 1f, which performes SBFL when only a single 
failing test case was generated by CBMC (and thus UUTs covered by the test 
case were equally suspicious). We used the following scoring method to evaluate 
the effectiveness of each of the methods for each benchmark. We envisage an 
engineer who is inspecting each LOC in descending order of suspiciousness using 
a given strategy (inspecting lines that appear earlier in the code first in the case 
of ties). We rank alternative techniques by the number of non-faulty LOC that 
are investigated until the engineer finds a fault. Finally, we report the average of 
these scores for the benchmarks to give us an overall measure of fault localisation 
effectiveness. 


3 https: //github.com/theyoucheng/cbmc. 
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We now discuss the benchmarks used in our experiments. In order to per- 
form an unbiased experiment to test our techniques on, we imposed that our 
benchmarks needed to satisfy the following three properties (aside from being a 
C program which CBMC could be used on): 


1. Programs must have been created by an independent source, to prevent any 
implicit bias caused by creating benchmarks ourselves. 

2. Programs must have an explicit, formally stated specification that can be 
given as an assertion statement in order to apply a model checker. 

3. In each program, the faulty code must be clearly identifiable, in order to be 
able to measure the quality of fault localisation. 


Unfortunately, benchmarks satisfying these conditions are rare. In practice, 
benchmarks exist in verification research that satisfy either the second or third 
criterion, but rarely both. For instance, the available SIR benchmarks satisfy 
the third criterion, but not the second*. The software verification competition 
(Sv-COMP) benchmarks satisfy the second criterion, but almost never satisfy the 
third’. Furthermore, it is often difficult to obtain benchmarks from authors even 
when usable benchmarks do in fact exist. Finally, we have been unable to find 
an instance of a C program that was not artificially developed for the purposes 
of testing. 

The benchmarks are described in Tablel, where we give the benchmark 
name, the number of faults in the program, and lines of code (LOC). The modified 
versions of tcas were made available by Alex Groce via personal correspondence 
and were used with the EXPLAIN tool in [33]°. The remaining benchmarks were 
identified as usable by manual investigation and testing in the repositories of 
SV-COMP 2013 and 2017. We have made our benchmarks available for download 
directly from the link on footnote 4. Faults in Sv-COMP programs were identified 
by comparing them to an associated fault-free version (in tcas the fault was 
already identified). A series of continuous lines of code that differed from the 
fault free version (usually one line, and rarely up to 5 Loc for larger programs) 
constituted one fault. Loc were counted using the cloc utility. 

We give further details about our application of CBMC in this experiment. 
For all our benchmarks, we used the smallest unwinding number that enables 
the bounded model checker to find a counterexample. These counterexamples 
were sliced, which usually results in a large improvement in fault localisation. 
For details about unwindings and slicing see the CBMC documentation [34]. In 
each benchmark each executable statement (variable initialisations, assignments, 
or condition statements) was determined as a UUT. 


t http://sir.unl.edu/portal/index.php. 
5 Benchmarks can be accessed at https://sv-comp.sosy-lab.org/2018/. 
6 For our experiment we activated assertion statement P5a and fault 32c. 
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6.2 Results and Discussion 


In this section we discuss our experimental results. In Table1l, columns 
If /sfo,/sfo give the scores for when the respective method is used. Column 
t gives the runtime for CBMC and sfo, respectively (we ignore the runtime for 
sfo due to negligible difference). |E| and |E| give the number of failing and pass- 
ing test cases generated by sfo,. The AVG row gives averages column values. 
We are primarily interested in comparing the scores of sfo, and 1f. 


Table 1. Experimental results 


# Benchmark Faults | LOC 1f t sfo | sfo, |t |E| |E] 
1 cdaudio_simpl11 4 2102 | 24 1.04 | 22 13 | 1.10} 3 8 
2 floppy_simp13 6 1080 | 39 0.36 | 33 8 0.38 | 3 11 
3 s3_clnt_1 1 546 35 3.52 | 33 3 3.56 2 7 
4 kundu2 3 534 63 0.58 | 63 7 0.60 | 1 13 
5 tcas 1 396 6 0.20) 5 5 0.21 2 4 
6 rule57_ebda 4 249 9 0.17 |9 2 0.18 1 4 
7 rule60_list2 1 187 14 0.17 | 14 8 0.18 1 3 
8 merge_sort 1 111 1 2.19] 1 1 2.32 | 1 0 
9 byte_add 1 90 17 0.18 | 15 0 0.18 | 3 8 
10 alternating_list | 2 56 1 0.31) 1 1 0.32 1 0 
11 eureka_01 1 52 7 0.17) 7 3 0.26 1 7 
12 string Į 43 5 0.17 | 2 2 0.17 3 3 
13 insertion_sort 1 25 3 1.05 | 3 0 4.28 1 3 
AVG 2.08 | 420.85 | 17.23 | 0.78 | 16.00 | 4.08 | 1.06 | 1.77 | 5.46 


We now discuss the results of the three techniques 1f, sfo and sfo,. On 
average, 1f located a fault after investigating 17.23 lines of code (4.09% of the 
program on average). The results here are perhaps better than expected. We 
observed that the single failing test case consistently returned good fault locali- 
sation potential given the use of slicing by the technique. 

We now discuss sfo. On average, sfo located a fault after investigating 16 
lines of code (3.8% of the program on average). Thus, the improvement over 1f 
is very small. When only one failing test case was available for sfo (i.e. |E| = 1) 
we emphasise that the SMT solver could not find any other failing traces which 
covered different parts of the program. In such cases, sfo performed the same 
as 1f (as expected). However, when there was more than one failing test case 
available (i.e. |E| > 1), sfo always made a small improvement. Accordingly, for 
benchmarks 1, 2, 3, 5, 9, and 12 the improvements in terms fewer LOC examined 
are 2, 6, 3, 1, 2, and 3, respectively. An improvement in benchmarks where sfo 
generated more than one test case is to be expected, given there was always a 
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fault covered by all failing test cases in each program (even in programs with 
multiple faults), thus taking advantage of the property of single fault optimal 
data. Finally, we conjecture that on programs with more failing test cases avail- 
able in the population, and on longer faulty programs, that this improvement 
will be larger. 

We now discuss sfo,. On average, sfo, located a fault after investigating 
4.08 Loc (0.97% of each program on average). Thus, the improvement over 
the other techniques is quite large (four times as effective as 1f). Moreover, this 
effectiveness came at very little expense to runtime — sfo, had an average runtime 
of 1.06 s, which is comparable to the runtime of 1f of 0.78 s. This is despite 
the fact that sfo, generated over 7 executions on average. We consequently 
conclude that implementations of Algorithm 1 can be used to facilitate efficient 
and effective fault localisation in practice on small programs. 


7 Related Work 


The techniques discussed in this paper improve the quality of data usable for 
SBFL. We divide the research in this field into the following areas; many other 
methods can be potentially combined with our technique. 


Test Suite Expansion. One approach to improving test suites is to add more 
test cases which satisfy a given criterion. A prominent criterion is that the test 
suite has sufficient program coverage, where studies suggest that test suites with 
high coverage improve fault localisation [15—17, 20]. Other ways to improve test 
suites for SBFL are as follows. Li et al. generate test suites for SBFL, considering 
failing to passing test case ratio to be more important than number [35]. Zhang 
et al. consider cloning failed test cases to improve SBFL [13]. Perez et al. develop a 
metric for diagnosing whether a test suite is of sufficient quality for SBFL to take 
place [14]. Li et al. consider weighing different test cases differently [36]. Aside 
from coverage criteria, methods have been studied which generate test cases 
with a minimal distance from a given failed test case [18]. Baudry et al. use 
a bacteriological approach in order to generate test suites that simultaneously 
facilitate both testing and fault localisation [19]. Concolic execution methods 
have been developed to add test cases to a test suite based on their similarity to 
an initial failing run [20]. 

Prominent approaches which leverage model checkers for fault localisation 
are as follows. Groce [33] uses integer linear programming to find a passing test 
case most similar to a failing one and then compare the difference. Schupman and 
Bierre [37] generate short counterexamples for use in fault localisation, where 
a short counterexample will usually mean fewer UUTs for the user to inspect. 
Griesmayer [38] and Birch et al. [39] use model checkers to find failing execu- 
tions and then look for whether a given number of changes to values of variables 
can be made to make the counterexample disappear. Gopinath et al. [40] com- 
pute minimal unsatisfiable cores in a given failing test case, where statements in 
the core will be given a higher suspiciousness level in the spectra ranking. Addi- 
tionally, when generating a new test, they generate an input whose test case is 
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most similar to the initial run in terms of its coverage of the statements. Fey 
et al. [41] use SAT solvers to localise faults on hardware with LTL specifications. 
In general, experimental scale is limited to a small number of programs in these 
studies, and we think our experimental component provides an improvement in 
terms of experimental scale (13 programs). 


Test Suite Reduction. An alternative approach to expanding a test suite is to 
use reduction methods. Recently, many approaches have demonstrated that it is 
not necessary for all test cases in a test suite to be used. Rather, one can select 
a handful of test cases in order to minimise the number of test cases required for 
fault localisation [42,43]. Most approaches are based on a strategy of eliminating 
redundant test cases relative to some coverage criterion. The effectiveness of 
applying various coverage criteria in test suite reduction is traditionally based 
on empirical comparison of two metrics: one which measures the size of the 
reduction, and the other which measures how much fault detection is preserved. 


Slicing. A prominent approach to improving the quality of test suites involves 
the process of slicing test cases. Here, SBFL proceeds as usual except the program 
and/or the test cases composing the test suite are sliced (with irrelevant lines 
of code/parts of the execution removed). For example, Alves et al. [44] combine 
Tarantula along with dynamic slices, Ju et al. [45] use SBFL in combination with 
both dynamic and execution slices. Syntactic dynamic slicing is built-in in all 
our tested approaches by appeal to the functionalities of CBMC. 

To our knowledge, no previous methods generate data which exhibit our 
property of single fault optimality. 


8 Conclusion 


In this paper, we have presented a method to generate single fault optimal data 
for use with SBFL. Experimental results on our implementation sfo,,, which inte- 
grates single fault optimal data along with passing test cases, demonstrate that 
small optimized fault localisation data can be generated efficiently in practice 
(1.06 s on average), and that subsequent fault localization can be performed effec- 
tively using this data (investigating 4.06 LOC until a fault is found). We envisage 
that implementations of the algorithm can be used in two different scenarios. 
In the first, the test suite generated can be used in standalone fault localisa- 
tion, providing a small and low cost test suite useful for repeating iterations of 
simultaneous testing and fault localisation during program development. In the 
second, the data generated can be added to any pre-existing data associated 
with a test suite, which may be useful at the final testing stage where we may 
wish to optimise single fault localisation. 

Future work involves finding larger benchmarks to use our implementation 
on and developing further properties, and methods for use with programs with 
multiple faults. We would also like to combine our technique with existing test 
suite generation algorithms in order to experiment how much test suites can be 
additionally improved for the purposes of fault localization. 
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Abstract. GUI testing of mobile applications gradually became a very 
important topic in the last decade with the growing mobile application 
market. We propose Test Case Mutation (TCM) which mutates existing 
test cases to produce richer test cases. These mutated test cases detect 
crashes that are not previously detected by existing test cases. TCM dif- 
fers from the well-known Mutation Testing (MT) where mutations are 
inserted in the source code of an Application Under Test (AUT) to mea- 
sure the quality of test cases. Whereas in TCM, we modify existing test 
cases and obtain new ones to increase the number of detected crashes. 
Android applications take the largest portion of the mobile application 
market. Hence, we evaluate TCM on Android by replaying mutated test 
cases of randomly selected 100 AUTs from F-Droid benchmarks. We show 
that TCM is effective at detecting new crashes in a given time budget. 


1 Introduction 


As of April 2016, there are over 2.6 billion smartphone users worldwide and 
this number is expected to go up [1]. There is an increasing focus on mobile 
application testing starting from the last decade in top testing conferences and 
journals [2]. Android applications have the largest share in the mobile application 
market, where 82.8% of all mobile applications are designed for Android [1]. 
Therefore, we focus on Android GUI Testing in this paper. 

The main idea of TCM is to mutate existing test cases to produce richer 
test cases in order to increase the number of detected crashes. We first iden- 
tify typical crash patterns that exist in Android applications. Then, we develop 
mutation operators based on these crash patterns. Typically mutation operators 
are applied to the source code of applications. However, in our work we apply 
them to test cases. 

Typical crash patterns in Android are Unhandled Exceptions, External 
Errors, Resource Unavailability, Semantic Errors, and Network-Based Crashes 
[3]. We describe one case study for each crash pattern. We define six novel muta- 
tion operators (Loop-Stressing, Pause-Resume, Change Text, Toggle Contextual 
State, Remove Delays, and Faster Swipe) and relate them to these five crash 
patterns. 
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We implement TCM on top of AndroFrame 

[4], a fully automated Android GUI testing 
tool. We give an overview of TCM in Fig. 1. 
First, we generate a test suite for the Appli- 
peer al Suite + | cation Under Test (AUT) using AndroFrame. 
: AndroFrame obtains an AUT Model which is 


E F represented as an Extended Labeled Transi- 


tion System (ELTS). We then minimize the 


-~~ AUT 


AndroFrame Test Results 


Minimized. Test Suite Generated Test Suite using the AUT Model in 
Eo order to reduce test execution costs (Test Suite 
Minimization). We apply Test Case Muta- 

Mutated Test Suite tion (TCM) on the Minimized Test Suite 


and obtain a Mutated Test Suite. We use 
T 1 Results | AndroFrame to execute the Mutated Test 
Suite and collect Test Results. 
We state our contributions as follows: 


Fig. 1. TCM overview 


1. Test Case Mutation Operators. We define six mutation operators on Android 
test cases to uncover new crashes. Our mutation operators are based on typi- 
cal Android crash patterns described in the literature [3]. All of the mutation 
operators are novel with the exception of changing text inputs. To the best 
of our knowledge, ours is the first work to use mutation-based test case gen- 
eration to detect different crash patterns in Android. 

2. Test Case Mutation (TCM) Algorithm. We describe a novel algorithm to 
generate new test cases from existing ones to detect more crashes. 

3. Test Suite Minimization Algorithm. We propose a coverage-based minimiza- 
tion algorithm to increase the effectiveness of TCM. 

4. Case Studies. We relate known Android crash patterns to our mutation oper- 
ators using case studies from F-Droid benchmarks. 

5. Experiments. We evaluate TCM for crash detection of 100 AUTs down- 
loaded from F-Droid benchmarks. We investigate how coverage and number 
of detected crashes change with respect to time. 


2 Background 


In this section, we first describe the basics of the Android GUI to facilitate the 
understanding of our paper. 

Android GUI is based on activities, events, and crashes. An activity is a 
container for a set of GUI components. These GUI components can be seen on the 
Android screen. Each GUI component has properties that describe boundaries 
of the component in pixels (x1, Y1, £2, Y2) and how the user can interact with 
the component (enabled, clickable, longclickable, scrollable, password). Each GUI 
component also has a type property from which we can understand whether 
the component accepts text input. A GUI component accepts text input if its 
password property is true or its type is EditTect. 
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Table 1. List of GUI actions 


Non-contextual | Param1 | Param2 | Param3 Param4 | Param5 
Click x y - - - 
Longclick x y - - - 

Text x y string |- - 

Swipe xl yl x2 y2 duration 
Menu - - - - - 

Back - - - - - 
Contextual Parameter 


Connectivity | on/off/toggle 


Bluetooth on/off/toggle 
Location gps/gps&network/off/toggle 
Planemode on/off/toggle 
Doze on/off/toggle 


Reinitialize Package | Activity | - 


The Android system and the user can interact with GUI components using 
events. We divide events in two categories, system events and GUI events 
(actions). We show the list of GUI actions that we use in Table 1, which covers 
more actions then are typically used in the literature. Note that GUI actions 
in Table1 are possible inputs from the user whereas system events are not. 
We group actions into three categories; non-contextual, contextual, and special. 
Non-contextual actions correspond to actions that are triggered by user gestures. 
Click and longclick take two parameters, x and y coordinates to click on. Text 
takes three parameters, x and y for coordinates and string to describe what to 
write. Swipe takes five parameters. The first four parameters describe the start- 
ing and the ending coordinates. The fifth parameter is used to adjust the speed 
of swipe. Menu and back actions have no parameters. These actions just click to 
the menu and back buttons of the mobile device, respectively. Contextual actions 
correspond to the user changing the contextual state of the AUT. Contextual 
state is the concatenation of the global attributes of the mobile device (internet 
connectivity, bluetooth, location, planemode, sleeping). The connectivity action 
adjusts the internet connectivity of the mobile device (adjusts wifi or mobile 
data according to which is available for the mobile device). Bluetooth, location, 
and planemode are straightforward. The doze action taps the power button of 
the mobile device and puts the device to sleep or wakes it. We use the doze 
action to pause and resume the AUT. Our only special action is reinitialize, 
which reinstalls and starts an AUT. System events are system generated events, 
e.g. battery level, receiving SMS, clock/timer. 


TCM: Test Case Mutation to Improve Crash Detection in Android 267 


We report a crash whenever a fatal exception is recorded in Android logs 
similar to previous work [3,5]. Crashes often result with the AUT terminating 
with or without any warning. Some crashes do not visually affect the execution, 
but the AUT halts as a result. 

We use the Extended Labeled Transition System (ELTS) [6] as a model for 
the AUT. Formally, an ELTS M = (V, vo, Z,w, A) is a 5-tuple, where 


— V is a set of states (vertices), 

— vo € V is the initial state, 

— Z is the set of all actions (input alphabet), 

—~w:VxV x Z is the state transition relation, and 

- A: V —> p(Z) is a state labeling function, where Vu € V,A(v) C Z denotes 
the set of actions enabled at state v. 


We define a GUI state, or simply a state v to be the concatenation of the (1) 
package name (a name representing the AUT), (2) activity name, (3) contextual 
state, and (4) GUI components. 

Each state v has a set of enabled actions \(v), extracted from its set of GUI 
components. We say that a GUI action, or simply an action z € A(v) is enabled 
at state v iff we can deduce that z interacts with at least one GUI component 
in v. 

A transition is a 3-tuple, (start-state, end-state, action), shortly denoted by 
(Us, Ue, Z). We extend the standard transition and define a delayed transition as 
a 4tuple, (start-state, end-state, action, delay in seconds), shortly denoted by 
(Us, Ve, z, d). We do this to later change the duration of transitions via mutation. 
We define an execution trace, or simply a trace t, as a sequence of delayed transi- 
tions. An example trace can be given as t = (v1, V2, 21, d1), (V2, U3, 22, da),.--, (Un; 
Un+1;2n;dn) where n is the length of the trace. 

We say that a trace t is a test case if the first state of the trace is the initial 
state vo (the GUI state when the AUT is started). A test suite T'S is a set of test 
cases. AndroFrame generates these test suites. Then, TCM applies minimization 
and mutation to generate new test suites. 


3 Android Crash Patterns and Mutation Operators 


In this section, we first describe typical crash patterns for Android applications 
based on related work in the literature [3]. We give a list of the crash patterns 
in Table2 and describe them below. 


3.1 Android Crash Patterns 


C1. Unhandled Exceptions. An AUT may crash due to misuse of libraries 
or GUI components, e.g. overuse of a third party library (stressing) may cause 
the third party library to crash. 
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C2. External Errors. An AUT may communicate with external applications. 
This communication requires either permissions or valid Inter Process Commu- 
nication (IPC) for Android. There are three types of IPC in Android; intents, 
binders, and shared memory. Intents are used to send messages between appli- 
cations. These messages are called bundles. Binders are used to invoke methods 
of other applications. An AUT may crash with an external error due to (1) the 
AUT attempts to communicate with another application without sufficient per- 
missions, (2) the AUT receives an intent with an invalid bundle from another 
application, (3) the AUT sends an intent with an invalid bundle and fails to 
receive an answer due to a crash in the other application, (4) another applica- 
tion uses a binder with illegal arguments, (5) the AUT uses a binder on another 
application with illegal arguments and fails to receive the return value due to 
a crash in the other application, or (6) shared memory of the AUT is freed by 
another application. 


Table 2. Relating crash patterns and mutation operators 


Crash patterns Mutation operators 
C1. Unhandled Exceptions | M1, M3, M6 

C2. External Errors M1, M4, M5, M6 
C3. Resource Unavailability | M2, M5 

C4. Semantic Errors M3 

C5. Network-Based Crashes | M4, M5, M6 


C3. Resource Unavailability. In Android, an AUT may be paused at any 
time by executing an onPause() method. This method is very brief and does 
not necessarily afford enough time to perform save operations. The onPause() 
method may terminate prematurely if its operations take too much time, causing 
a resource unavailability problem that may crash the AUT when it is resumed. 
Another problem is that an AUT may use one or more system resources such 
as memory and sensor handlers (e.g. orientation) during execution. When the 
AUT is paused, it releases system resources. The AUT may crash if it is unable 
to allocate these resources back when it is resumed. 


C4. Semantic Errors. An AUT may crash if it fails to handle certain inputs 
given by the user. For example, AUT may crash instead of generating a warning 
if some textbox is left empty, or contains an unexpected text. 


C5. Network-Based Crashes. An AUT may connect with remote servers or 
peers via bluetooth or wifi. The AUT may crash and terminate if it does not 
handle the cases where the server is unreachable, the connectivity is disabled, or 
the communicated data causes an error in the AUT. 
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3.2 Mutation Operators 


We now define the set of Android mutation operators that we developed. We 
denote these operators by A. We describe these mutation operators, then relate 
them to the crash patterns above, and summarize these relations in Table 2. 


Definition 1. A mutation operator 6 is a function which takes a test case t and 
returns a new test case t'. We denote a mutation as t = d(t). 


M1. Loop-Stressing (drs). t’ = dus(t) reexecutes all looping actions of a test 
case t multiple times with d’ second delay. An action z; of a delayed transition 
ti = (vi, vi41, Zi, di) in t is looping iff viii = vi. Let tj... denote the subsequence 
of actions between j‘? and kt” indices of test case t, inclusively. Then, 


ti Vi FA Vi44 
drs (t) = ti? -t -...- tl? where t = 4 ti tj +... th Ui = Vipi (1) 
ee_»>_ —_’ 
m times 


Here n is the length of test case t and t; = (vi, 0:41, Zi, d’). We pick d’ = 1 to avoid 
double-click, which may be programmed as a separate action than single click. 
We pick m = 9. We have two motivations for choosing m = 9. First, in our case 
studies, we did not encounter a crash when m < 9. Second, although we detect 
the same crash when m > 9, we want to keep m as small as possible to keep 
test cases small. Loop-stressing may lead to an unhandled exception (C1) due to 
stressing the third party libraries by invoking them repeatedly. Loop-stressing 
may also lead to an external error (C2) if it stresses another application until it 
crashes. 


M2. Pause-Resume (pr). t = dpr(t) adds two consecutive doze actions 
between all transitions of the test case t. Let t?” = (vi, doze off, 2)-(v;, doze on, 2). 
Then, 

Opa (é) =t -ti th- t2-... E" -tn (2) 


Pause-resume may trigger a crash due to resource unavailability (C3). 


M3. Change Text (dcr). We assume that existing test cases contain well- 
behaving text inputs to explore the AUT as much as possible. To increase the 
number of detected crashes, we modify the contents of the texts. 

t = ðcr(t) first picks one random abnormal text manipulation operation and 
applies it to a random textentry action of the existing test case t. Abnormal text 
manipulation operations can be emptytext, dottext, and longtext where empty- 
text deletes the text, dottext enters a singe dot character, and longtext enters a 
random string of length >200. 

Let z% denote a random abnormal text manipulation action where z; is a 
text action and d¢ denotes the new delay required to completely execute z¢*. 
We define t’ = dor(t) on test cases as follows: 


7 t fz; = tertentry 
dor(t) = — -tt tia. otherwise 6) 
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where n is the length of t and t@ = (vi, vidi, 26, de’). An AUT may crash because 
the corresponding onTextChange() method of the AUT throws an unhandled 
exception (C1). The AUT may also crash if the content of the text is an unex- 


pected kind of input, which causes a semantic error later (C3). 


M4. Toggle Contextual State (Orcs). Existing test suites typically lack con- 
textual actions where the condition of the contextual state is crucial to generate 
the crash. Therefore, we introduce contextual state toggling with t = ôrcs(t) 
which is defined as follows. 


Srcslt) = ti tS te a ye BO (4) 


where n is the length of test case t and t/°* is a contextual action transition 
(Vi+1, 0:41, 2°, d’). 2*°* corresponds to a random contextual toggle action. We 
pick d! = 10s for each contextual action since Android may take a long time 
before it stabilizes after the change of contextual state. Toggling the contextual 
states of the AUT may result in an external error (C2), or a network-based crash 
if the connection failures are not handled correctly (C5). 


M5. Remove Delays (rp). t = drp(t) takes a test case t and sets all of its 
delays to 0. When reproduced, the events of t will be in the same order with t, 
but sent to the AUT at the earliest possible time. 


drp(t) = (v1, V2, 21,0) i (v2, U3, 22, 0) re EN (Un, Un+1) Zn; 9) (5) 


If the AUT is communicating with another application, removing delays may 
cause the requests to crash the other application. If this case is not handled 
in the AUT, the AUT crashes due to external errors (C2). If the AUT’s back- 
ground process is affected by the GUI actions, removing delays may cause the 
background process to crash due to resource unavailability (C3). If the GUI 
actions trigger network requests, having no delays may cause a network-based 
crash (C5). 


M6. Faster Swipe (ps). t = dpg(t) increases the speed of all swipe actions 
of a test case t. Let zf * denote a faster version of z;, where z; is a swipe action. 
Then, we define dps on test cases with at least one swipe action as follows. 


psht) = tf tf... tfs (6) 
where n is the length of test case t and 
ifs E (vi, Ui+1; Ži; di) Ži is NOT a swipe 
E (vi, Vi+1, zf, di) otherwise 
If the information presented by the AUT is downloaded from a network or 
another application, swiping too fast may cause a network-based crash (C3) due 
to the network being unable to provide the necessary data or an external error 


(C2). If the AUT is a game, swiping too fast may cause the AUT to throw an 
unhandled exception (C1). 
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Algorithm 1. Test Suite Minimization Algorithm 
Require: 

TS : A test suite for the AUT 

M : AUT Model 


Ensure: 
TS’ : Minimized Test Suite 


1: Ts’ — 6 
2: for t € {t:t € TS At does not crash} do > Iterate over non-crashing test cases 
if covar(T'S’ U {t}) > cov.,(TS’) then > Take only the test cases that increase coverage 
t — argmint,,; S-t. covm (T S’ U {t1...i}) = cova (TS’ U{t}) D Shorten the test case 
i 


TS'—TS'U {t} > Add the shortened test case to the Minimized Test Suite 
end if 
end for 


Algorithm 2. Test Case Mutation (TCM) Algorithm 


Require: 
TS : A Test Suite 
X : Timeout of the New Test Suite 
A: Set of Mutation Operators 
Ensure: 
TS’ : New Test Suite 


1: TS’ — {} 

2: 2-0 

3: repeat 

4: t — random t € TS > Pick a random test case 
5: â +— random ô € A s.t. t Æ ô(t) > Pick a mutation operator that changes the test case 
6: t — 6(t) > Apply the mutation operator to the test case 
T: TS’ —TS' vu {t} > Add the mutated test case to the New Test Suite 
8: xz r+ Estande d >œ Calculate the total delay 
9: until x > X > Repeat until the total delay is above the given timeout 


4 Test Suite Minimization and Test Case Mutation 


Before mutating the existing test cases in a test suite TS, we first minimize T'S. 
In order to minimize a test suite TS, we first define an edge coverage function 
cov.,(T'S) over the AUT model M as follows: 


# of unique transitions covered in the AUT Model M by TS 
# of all transitions in the AUT Model M 
(7) 


We present our Test Suite Minimization approach in Algorithm 1. We iterate 
over all non-crashing test cases of the original test suite TS in line 2. We use 
non-crashing test cases in Algorithm 1 because our goal is to generate crashes 
from non-crashing via mutation. We check if the test case t increases the edge 
coverage in line 3. If t increases the edge coverage, we shorten the test case t from 
its end by deleting transitions that are not contributing to the edge coverage and 
add the shortened test case t’ to the minimized test suite. 

We present our Test Case Mutation approach in Algorithm 2. We pick a 
random test case t from given TS in line 4. Then, we pick a random mutation 
operator ô that changes t in line 5. We mutate t with 6 and add the mutated 
test case t’ to TS’ until the total delay of T'S” exceeds the given timeout X. 


cov (T'S) = 
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Test Case A 
Tiis vl reinit 10 
2|v1 v2 click 1 
3|v2 vl back 1 reinitialize 
Alv1 v2 click 1 
5|v2 vl back 1 Cu Doak 
Test Case B 
Il vl reinit 8 
2|v1 v3 menu 2 Mutated 1 
3lv3 CRASH menu 1 1| _ vl reinit 15 Mutated 2 
2|/v1 v1 back 1 1} - vl reinit 15 
Test Case C enuy 3|v1 v1 back 1 2\v1 _ doze off 2 
1| - vi reinit 9 Alv1 v1 back 1 3| - v1 doze on 2 
2\vi vi back 0 5|v1 v1 back 1 4|v1i v1 back 0 
3|v1 v2 click 1 6|v1 v1 back 1 5|v1l _ doze off 2 
4)v2 v3 click 2 7|\v1 v1 back 1 6| - v1 doze on 2 
5|v3 CRASH menu 2 8]/v1 v1 back 1 7|v1 v2 click 2 
Test Case D 9|v1 v1 back 1 8|v2 _ doze off 2 
a pes vl reinit 15 10|v1 v1 back 1 9| - v2 doze on 2 
2|v1 vl back 0 11|v1 vl back 0 ||10|v2 v1 back 1 
3}vl v2 click 2 12|v1 v2 click 2 |/11]/v1 _ doze off 2 
Alv2 vl back 1 13|v2 vl back 1||12| _ v1 doze on 2 
5ivl v3 menu 3 14/v1l v3 menu 3 ||13|v1 v3 menu 3 


(a) Test Cases (b) AUT Model (c) Mutated Test Cases 


Fig. 2. Motivating example (mutations are denoted as bold) 


5 Motivating Example 


Figures 2a and b show a test suite and an AUT model, respectively. We generate 
this test suite and the AUT model by executing AndroFrame for one minute on 
an example AUT. We execute AndroFrame for just one minute, because that is 
enough to generate test cases for this example. We limit the maximum number 
of transitions per test case to five to keep the test cases small in this motivating 
example for ease of presentation. The test suite has four test cases; A, B, C, and 
D. Each row of test cases describes a delayed transition. The click action has 
coordinates, but we abstract this information for the sake of simplicity. 

Among the four test cases reported by AndroFrame, we take only the non- 
crashing test cases, A and D. In our example, we include D since it increases 
the edge coverage and we exclude A since all of A’s transitions are also D’s 
transitions, i.e. A is subsumed by D. Then, we attempt to minimize test case 
D without reducing the edge coverage. In our example, we don’t remove any 
transitions from D because all transitions in D contribute to the edge coverage. 
We then generate mutated test cases by randomly applying mutation operators 
to D one by one until we reach one minute timeout. Figure 2c shows an example 
mutated test suite. Test case Mutated 1 takes D and exercises the back button 
for multiple times to stress the loop at state v1. Test case Mutated 2 clicks the 
hardware power button twice (doze off, doze on) between each transition. This 
operation pauses and resumes the AUT in our test devices. We then execute all 
mutated test cases on the AUT. Our example AUT in fact crashes when the loop 
on vı is reexecuted more than eight times and also crashes when the AUT is 
paused in state v2. When executed, our mutated test cases reveal these crashes 
both at their ninth transition, doubling the number of detected crashes. 
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Fig. 3. Number of total distinct crashes detected across time 


6 Evaluation 


In this section, we evaluate TCM via experiments and case studies. We show 
that, through experiments, we improve crash detection. We then show, with 
case studies, how we detect crash patterns. 


6.1 Experiments 


We selected 100 AUTs (excluding the case studies described later) from F-Droid 
benchmarks [7] for experiments. To evaluate the improvement in crash detection, 
we first execute AndroFrame, Sapienz, PUMA, Monkey, and A?E for 20 min each 
on these applications with no mutations enabled on test cases. Then we execute 
TCM with 10min for AndroFrame to generate test cases and 10min to mutate 
the generated test cases and replay them to detect more crashes. AndroFrame 
requires the maximum length of a test case as a parameter. We used its default 
parameter, 80 transitions maximum per test case. 

Figure 3 shows the number of total distinct crashes detected by each tool 
across time. Whenever a crash occurs, the Android system logs the resulting 
stack trace. We say that two crashes are distinct if stack traces of these crashes 
are different. 

Our results show that AndroFrame detects more crashes than any other tool 
from very early on. TCM detects the same number of crashes with AndroFrame 
for the first 10 min (600s). During that time, AndroFrame detects 15 crashes. In 
the last 10 min, TCM detects 14 more crashes whereas AndroFrame detects only 
3 more crashes. As a result TCM detects 29 crashes in total whereas AndroFrame 
detects 18 crashes in total. As a last note, all other tools including AndroFrame 
seem to stabilize after 20min whereas TCM finds many crashes near timeout. 
This shows us that TCM may find even more crashes when timeout is longer. 
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(a) Execution of Test Case t (b) Execution of Test Case t’ = dcr(t) 


Fig. 4. An example crash found only by TCM 


Overall, TCM finds 14 more crashes than AndroFrame and 17 more crashes than 
Sapienz, the best among other tools. 

We also investigate how much each mutation operator contributes to the 
number of detected crashes. Our observations reveal that M1 (d,s) detects one 
crash, M2 (ôpr) detects four crashes, M3 (dcr) detects two crashes, M4 (ôrcs) 
detects two crashes, M5 (rp) detects four crashes, and M6 (dps) detects one 
crash. These crashes add up to 14, which is the number of crashes detected by 
TCM in the last 10min. This result shows that while all mutation operators 
contribute to the crash detection, M2 and M5 have the largest contribution. 

We present and explain one crash that is found only by TCM in Fig. 4. 
Figure 4a shows an instance where AndroFrame generates and executes a test 
case t on the Yahtzee application. Note that t does not lead to a crash, but only 
a warning message. Figure4b shows the instance where TCM mutates t and 
executes the mutated test case t’. When t is executed, the application crashes 
and terminates. We note that this crash was not found by any other tool. Mao 
et al. [8] also report that Sapienz and Dynodroid did not find any crashes in this 
application. 


6.2 Case Studies 


In this section, we verify that the aforementioned crash patterns exist via case 
studies, one case study for each crash pattern. These studies verify that all of 
our crash patterns are observable in Android platform. These case studies help 
us develop and fine-tune our mutation operators. 


Case Study 1. Figure 5a shows a crashing activity of the SoundBoard appli- 
cation included in F-Droid benchmarks. Basically, the coin and tube buttons 
activate a third party library, AudioF linger, to produce sound when tapped. 
AndroFrame generates test cases which tap these buttons. These test cases pro- 
duce no crashes. Then, we mutate the test cases with TCM. When we apply 
loop-stressing (M1) on any of these buttons, AudioF linger crashes due to overuse. 
AudioF linger produces a fatal exception (C1) in Android logs. This crash does 
not cause an abnormal termination, but it causes the AUT to stop functioning 
(the AUT stops producing sounds until it is restarted). 
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(d) Semantic Error (C4) Example (e) Network-Based Crash (C5) Example 


C1: Unhandled Exception (C1) Example 
C2: External Error (C2) Example 


Fig. 5. Case studies 1-5 


Case Study 2. Figure5b shows a crashing activity of the a2dpVol appli- 
cation included in F-Droid benchmarks, where AndroFrame fails to generate 
crashing test cases. We mutate these test cases with TCM. When we activate 
bluetooth (M4), tapping find devices button produces a crash in the external 
android. bluetooth. IBluetooth application due to a missing method (C2) and the 
AUT terminates. 


Case Study 3. Figure 5c shows a crashing activity of the importcontacts appli- 
cation included in F-Droid benchmarks. The AUT handles the case that it fails 
to import contacts, as we show in the leftmost screen. Pausing the AUT at this 
screen causes the background process to abort and free its allocated memory 
(we show the related screen in the middle). However, the paused activity is not 
destroyed. If the user tries to resume this activity, the AUT crashes as we show in 
the rightmost screen, since the memory was freed before. TCM applies a pause- 
resume mutation (M2) and triggers this resource unavailability crash (C3). 


Case Study 4. Figure5d shows a crashing activity of the aCal application 
included in F-Droid benchmarks. AndroFrame generates test cases with well- 
behaving text inputs. These test cases produce no crashes. Then, we mutate the 
test cases with TCM. When we apply change text (M3) on the last text box and 
then tap the configure button, this produces a semantic error (C4). The AUT 
crashes and terminates. 
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Case Study 5. Figure 5e shows a crashing activity of the Mirrored application 
included in F-Droid benchmarks. When wifi is turned off, the AUT goes into 
offline mode and does not crash as shown in the leftmost screen. When we toggle 
wifi (M4), the AUT retrieves several articles as shown in the middle, but crashes 
when it fails to retrieve article contents due to a network-based crash (C5) as 
shown in the rightmost screen. 


7 Discussion 


Although TCM is conceptually applicable to different GUI platforms, e.g. iOS 
or a desktop computer, there are three key challenges. First, our crash patterns 
are not guaranteed to exist or be observable in different platforms. Second, our 
mutation operators may not be applicable to those platforms, e.g. swipe may 
not be available as a gesture. Third, either an AUT model may be impossible to 
obtain or a replayable test case may be impossible to generate in those platforms. 
When all these challenges are addressed, we believe TCM should be applicable 
to not just Android, but other platforms as well. 

TCM mutates test cases after they are generated. We could apply mutated 
inputs immediately during test generation. However, this requires us to alter the 
test generation process which may not be possible if a third party test generation 
tool is used. Our approach is conceptually applicable to any test generation tool 
without altering the test generation tool. 

We use an edge coverage criterion to minimize a given test suite. Because 
of this the original test suite covers potentially more paths than the minimized 
test suite and therefore explores the same edge in different contexts. Without 
minimization, test cases in the test suite are too many and too large to generate 
enough mutations to observe crashes in given timeout. Therefore, we argue that 
by minimizing the test suite we improve the crash detection performance of TCM 
at the cost of the test suite’s completeness in terms of a higher coverage criterion 
than edge coverage. 

Although TCM detects crashes, it does not detect all possible bug patterns. 
Qin et al. [9] thoroughly classifies all bugs in Android. According to this classi- 
fication, there are two types of bugs in Android, Bohrbugs and Mandelbugs. A 
Bohrbug is a bug whose reachability and propagation are simple. A Mandelbug 
is a bug whose reachability and propagation are complicated. Qin et al. further 
categorize Mandelbugs as Aging Related Bugs (ARB) and Non-Aging Related 
Mandelbugs (NAM). Qin et al. also define five subtypes for NAM and six sub- 
types for ARB. TCM detects only the first two subtypes of NAM, TIM and SEQ. 
TIM and SEQ are the only kinds of bugs which are triggered by user inputs. If 
a bug is TIM, the error is caused by the timing of inputs. If a bug is SEQ, the 
error is caused by the sequencing of inputs. 

We note two key points on the crash patterns of TCM. First, testing tools 
we compare TCM with only detect SEQ bugs. TCM introduces the detection 
of TIM bugs in addition to SEQ bugs. Second, Azim et al. [3] further divides 
SEQ and TIM bugs into six crash patterns. We base our crash patterns on these 
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crash patterns. We present both external errors and permission violations as 
one crash pattern since permission violations occur as attempts to communicate 
with external applications with insufficient permissions. As a result, we obtain 
five crash patterns. 

We did not encounter any crash patterns other than the five crash patterns 
that we describe in Sect. 3. However, it is still possible to observe other crash 
patterns with our mutation operators due to emerging crash patterns caused by 
the fragmentation and fast development of the Android platform. 

Our mutation operators insert multiple transitions to the test case, creating 
an issue of locating the fault inducing transition. Given that the mutated test 
case detects a crash, fault localization can be achieved using a variant of delta 
debugging [10]. 

We use regular expressions on the Android logs to detect crashes. In the 
experiment, we only detected FATAL EXCEPTION labeled errors as done in 
previous work [3,5], ignoring Application Not Responding (ANR) and other 
errors described by Carino and Andrews [11]. Although we believe that TCM 
would still detect more crashes than pure AndroFrame (fatal exception is the 
most common crash in Android), we will improve our crash detection procedure 
as a future work to give more accurate results. 

We randomly selected 100 Android applications from the well-known F-Droid 
benchmarks also used by other testing tools [7]. We show that these applications 
have similar characteristics with the rest of F-Droid applications in our previous 
work. 


8 Related Work 


Test Case Mutation (TCM) differs from the well-known Mutation Testing (MT) 
[12] where mutations are inserted in the source code of an AUT to measure the 
quality of existing test cases. Whereas in TCM, we update existing test cases 
to increase the number of detected crashes. Oliveria et al. [13] are the first to 
suggest using Mutation Testing (MT) for GUIs. Deng et al. [14] define several 
source code level mutation operators for Android applications to measure the 
quality of existing test suites. 

The concept of Test Case Mutation is not new. In Android GUI Testing, 
Sapienz [8] and EvoDroid [15] are Android testing tools that use evolution- 
ary algorithms, and therefore mutation operators. Sapienz shuffles the orders of 
the events, whereas EvoDroid mutates the test case in two ways: (1) EvoDroid 
transforms text inputs and (2) EvoDroid either injects, swaps, or removes events. 
TCM mutates not only text inputs, but also introduces 5 more novel mutation 
operators. Furthermore, Sapienz and EvoDroid use their mutation operators 
for both exploration and crash detection whereas we specialize TCM’s muta- 
tion operators for crash detection only. In Standard GUI Testing, MuCRASH 
[16] uses test case mutation via defining special mutation operators on test 
cases, where the operators are defined at the source code level. They use TCM 
for crash reproduction, whereas ours is the first work that uses TCM to dis- 
cover new crashes. Directed Test Suite Augmentation (DTSA) introduced by 
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Xu et al. in 2010 [17] also mutates existing test cases but for the goal of achiev- 
ing a target branch coverage. 

We implement TCM on AndroFrame [4]. AndroFrame is one of the state-of- 
the-art Android GUI Testing tools. AndroFrame finds more crashes than other 
available alternatives in the literature such as AE and Sapienz. These tools 
generate replayable test cases as well. They provide the necessary utilities to 
replay their generated test cases. We can mutate these test cases but most of 
our mutations won’t be applicable for two reasons. First, ASE and Sapienz do 
not learn a model from which we can extract looping actions. Second, A°E 
and Sapienz do not support contextual state toggling. Implementing all of our 
mutations on top of these tools is possible, but requires a significant amount of 
engineering effort. Therefore we implement TCM on top of AndroFrame. 

Other black-box testing tools in the literature include A'E [18], SwiftHand 
[6], PUMA [19], DynoDroid [20], Sapienz [8], EvoDroid [15], CrashScope [5] and 
MobiGUITAR [21]. From these applications, only EvoDroid, CrashScope, and 
MobiGUITAR are publicly unavailable. 

Monkey is a simple random generation-based fuzz tester for Android. Mon- 
key detects the largest number of crashes among other black-box testing tools. 
Generation-based fuzz testing is a popular approach in Android GUI Testing, 
which basically generates random or unexpected inputs. Fuzzing could be com- 
pletely random as in Monkey, or more intelligent by detecting relevant events 
as in Dynodroid [20]. TCM can be viewed as a mutation-based fuzz testing 
tool, where we modify existing test cases rather than generating test cases from 
scratch. TCM can be implemented on top of Monkey or DynoDroid to improve 
crash detection of these tools. 

Baek and Bae [22] define a comparison criterion for Android GUI states. 
AndroFrame uses the maximum comparison level described in this work, which 
makes our models as fine-grained as possible for black-box testing. 


9 Conclusion 


In this study, we developed a novel test case mutation technique that allows us 
to increase detection of crashes in Android applications. We defined six muta- 
tion operators for GUI test cases and relate them to commonly occurring crash 
patterns in Android applications. We obtained test cases through a state-of-the- 
art Android GUI testing tool, called AndroFrame. We showed with several case 
studies that our mutation operators are able to uncover new crashes. 

As a future work, we plan to study a broader set of GUI actions, such as 
rotation and doubleclick. We will improve our mutation algorithm by sampling 
mutation operators from a probability distribution based on crash rates rather 
than a uniform distribution. We will find the most optimal timings for executing 
the test generator and TCM, rather than dividing the available time into two 
equal halves. We will further investigate Android crash patterns. 
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Abstract. In this paper, we present CRETE, a versatile binary-level con- 
colic testing framework, which features an open and highly extensible 
architecture allowing easy integration of concrete execution frontends 
and symbolic execution engine backends. CRETE’s extensibility is rooted 
in its modular design where concrete and symbolic execution is loosely 
coupled only through standardized execution traces and test cases. The 
standardized execution traces are LLVM-based, self-contained, and com- 
posable, providing succinct and sufficient information for symbolic execu- 
tion engines to reproduce the concrete executions. We have implemented 
CRETE with KLEE as the symbolic execution engine and multiple con- 
crete execution frontends such as QEMU and 8051 Emulator. We have 
evaluated the effectiveness of CRETE on GNU CoREUTILS programs and 
TianoCore utility programs for UEFI BIOS. The evaluation of CORE- 
UTILS programs shows that CRETE achieved comparable code coverage 
as KLEE directly analyzing the source code of COREUTILS and generally 
outperformed ANGR. The evaluation of TianoCore utility programs found 
numerous exploitable bugs that were previously unreported. 


1 Introduction 


Symbolic execution [1] has become an increasingly important technique for auto- 
mated software analysis, e.g., generating test cases, finding bugs, and detecting 
security vulnerabilities [2-11]. There have been many recent approaches to sym- 
bolic execution [12-22]. Generally speaking, these approaches can be classified 
into two categories: online symbolic execution (e.g., BitBlaze [4], KLEE [5], and 
s°E |6]), and concolic execution (a.k.a., offline symbolic execution, e.g., CUTE [2], 
DART [8], and SAGE [7]). Online symbolic execution closely couples Symbolic 
Execution Engines (SEE) with the System Under Test (SUT) and explore all 
possible execution paths of SUT online at once. On the other hand, concolic 
execution decouples SEE from the SUT through traces, which concretely runs a 
single execution path of a SUT and then symbolically executes it. 
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Both online and offline symbolic execution are facing new challenges, as com- 
puter software is experiencing an explosive growth, both in complexities and 
diversities, ushered in by the proliferation of cloud computing, mobile comput- 
ing, and Internet of Things. Two major challenges are: (1) the SUT involves many 
types of software for different hardware platforms and (2) the SUT involves many 
components distributed on different machines and as a whole the SUT cannot fit 
in any SEE. In this paper, we focus on how to extend concolic execution to sat- 
isfy the needs for analyzing emerging software systems. There are two major 
observations behind our efforts on extending concolic execution: 


— The decoupled architecture of concolic execution provides the flexibility in 
integrating new trace-captured frontends for emerging platforms. 

— The trace-based nature of concolic testing offers opportunities for selectively 
capturing and synthesizing reduced system-level traces for scalable analysis. 


We present CRETE, a versatile binary-level concolic testing framework, which 
features an open and highly extensible architecture allowing easy integration of 
concrete execution frontends and symbolic execution backends. CRETE’s exten- 
sibility is rooted in its modular design where concrete and symbolic execution is 
loosely coupled only through standardized execution traces and test cases. The 
standardized execution traces are LLVM-based, self-contained, and composable, 
providing succinct and sufficient information for SEE to reproduce the concrete 
executions. The CRETE framework is composed of: 


— A CRETE tracing plugin, which is embedded in the concrete execution 
environment, captures binary-level execution traces of the SUT, and stores the 
traces in a standardized trace format. 

— A CRETE manager, which archives the captured execution traces and test 
cases, schedules concrete and symbolic execution, and implements policies for 
selecting the traces and test cases to be analyzed and explored next. 

— A CRETE replayer, which is embedded in the symbolic execution environ- 
ment, performs concolic execution on captured traces for test case generation. 


We have implemented the CRETE framework on top of QEMU [23] and KLEE, 
particularly the tracing plugin for QEMU, the replayer for KLEE, and the man- 
ager that coordinates QEMU and KLEE to exchange runtime traces and test cases 
and manages the policies for prioritizing runtime traces and test cases. To val- 
idate CRETE extensibility, we have also implemented a tracing plugin for the 
8051 emulator [24]. The trace-based architecture of CRETE has enabled us to 
integrate such tracing frontends seamlessly. To demonstrate its effectiveness and 
capability, we evaluated CRETE on GNU COREUTILS programs and TianoCore 
utility programs for UEFI BIOS, and compared with KLEE and ANGR, which are 
two state-of-art open-source symbolic executors for automated program analysis 
at source-level and binary-level. 

The CRETE framework makes several key contributions: 


— Versatile concolic testing. CRETE provides an open and highly extensible 
architecture allowing easy integration of different concrete and symbolic exe- 
cution environments, which communicate with each other only by exchanging 
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standardized traces and test cases. This significantly improves applicability 
and flexibility of concolic execution to emerging platforms and is amenable to 
leveraging new advancements in symbolic execution. 

— Standardizing runtime traces. CRETE defines a standard binary-level trace 
format, which is LLVM based, self-contained and composable. Such a trace is 
captured during concrete execution, representing an execution path of a SUT. 
It contains succinct and sufficient information for reproducing the execution 
path in other program analysis environment, such as for symbolic execution. 
Having standardized traces minimizes the need of converting traces for dif- 
ferent analysis environment and provides a basis for common trace-related 
optimizations. 

— Implemented a CRETE prototype. We have implemented CRETE with 
KLEE as the SEE backend and multiple concrete execution frontends such 
aS QEMU and 8051 Emulator. CRETE achieved comparable code coverage on 
COREUTILS binaries as KLEE directly analyzing at source-level and generally 
outperformed ANGR. CRETE also found 84 distinct and previously-unreported 
crashes on widely-used and extensively-tested utility programs for UEFI BIOS 
development. We also make CRETE implementation publicly available to the 
community at github.com/SVL-PSU//crete-dev. 


2 Related Work 


DART [3] and CUTE [2] are both early representative work on concolic testing. 
They operate on the source code level. CRETE further extends concolic testing 
and targets close-source binary programs. SAGE [7] is a Microsoft internal con- 
colic testing tool that particularly targets at X86 binaries on Windows. CRETE 
is platform agnostic: as long as a trace from concrete execution can be converted 
into the LLVM-based trace format, it can be analyzed to generate test cases. 

KLEE [5] is a source-level symbolic executor built on the LLVM infrastruc- 
ture [25] and is capable of generating high-coverage test cases for C programs. 
CRETE adopts KLEE as its SEE, and extends it to perform concolic execution 
on standardized binary-level traces. S75 [6] provides a framework for develop- 
ing tools for analyzing close-source software programs. It augments a Virtual 
Machine (VM) with a SEE and path analyzers. It features a tight coupling of 
concrete and symbolic execution. CRETE takes a loosely coupled approach to 
the interaction of concrete and symbolic execution. CRETE captures complete 
execution traces of the SUT online and conducts whole trace symbolic analysis 
off-line. 

BitBlaze [4] is an early representative work on binary analysis for computer 
security. It and its follow-up work Mayhem [8] and MergePoint [12] focus on 
optimizing the close coupling of concrete and symbolic execution to improve the 
effectiveness in detecting exploitable software bugs. CRETE has a different focus 
on providing an open architecture for binary-level concolic testing that enables 
flexible integration of various concrete and symbolic execution environments. 

ANGR [14] is an extensible Python framework for binary analysis using 
VEX [26] as an intermediate representation (IR). It implemented a number of 
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existing analysis techniques and enabled the comparison of different techniques 
in a single platform. ANGR needs to load a SUT in its own virtual environment 
for analysis, so it has to model the real execution environment for the SUT, like 
system calls and common library functions. CRETE, however, performs in-vivo 
binary analysis, by analyzing binary-level trace captured from unmodified exe- 
cution environment of a SUT. Also, ANGR needs to maintain execution states for 
all paths being explored at once, while CRETE reduces memory usage dramat- 
ically by analyzing a SUT path by path and separates symbolic execution from 
tracing. 

Our work is also related to fuzz testing [27]. A popular representative tool for 
fuzzing is AFL [28]. Fuzzing is fast and quite effective for bug detection; however, 
it can easily get stuck when a specific input, like magic number, is required to 
pass a check and explore new paths of a program. Concolic testing guides the 
generation of test cases by solving constraints from the source code or binary exe- 
cution traces and is quite effective in generating complicated inputs. Therefore, 
fuzzing and concolic testing are complementary software testing techniques. 


3 Overview 


During the design of the CRETE framework for binary-level concolic testing, we 
have identified the following design goals: 


— Binary-level In-vivo Analysis. It should require only the binary of the SUT 
and perform analysis in its real execution environment. 

— Extensibility. It should allow easy integration of concrete execution fron- 
tends and SEE backends. 

— High Coverage. It should achieve coverage that is not significantly lower 
than the coverage attainable by source-level analysis. 

— Minimal Changes to Existing Testing Processes. It should simply pro- 
vide additional test cases that can be plugged into existing testing processes 
without major changes to the testing processes. 


To achieve the goals above, we adopts an online/offline approach to concolic 
testing in the design of the CRETE framework: 


— Online Tracing. As the SUT is concretely executed in a virtual or physical 
machine, an online tracing plugin captures the binary-level execution trace 
into a trace file. 

— Offline Test Generation. An offline SEE takes the trace as input, injects 
symbolic values and generates test cases. The new test cases are in turn applied 
to the SUT in the concrete execution. 


This online tracing and offline test generation process is iterative: it repeats until 
all generated test cases are issued or time bounds are reached. We extend this 
process to satisfy our design goals as follows. 
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Config file + Target Binary Symbolic Execution Engine 
CRETE Runner CRETE Replayer 
OS, Drivers, Libraries Selected Trace 


Virtual Machine Captured Trace 


CRETE Manager 
CRETE Tracer 


Fig. 1. CRETE architecture 


— Execution traces of a SUT are captured in its unmodified execution envi- 
ronment on binary-level. The tracing plugin can be an extension into a VM 
(Sect. 4.1), a hardware tracing facility, or a dynamic binary instrumentation 
tool, such as PIN [29], and DynamoRIO [30]. 

— The concrete and symbolic execution environments are decoupled by standard- 
ized traces (Sect. 4.2). As long as they can generate and consume standardized 
traces, they can work together as a cohesive concolic process. 

— Optimization can be explored on both tracing and test case generation, for 
example, selective binary-level tracing to improve scalability (Sect. 4.3), and 
concolic test generation to reduce test case redundancy (Sect. 4.4). This makes 
high-coverage test generation on binary-level possible. 

— The tracing plugin is transparent to existing testing processes, as it only col- 
lects information. Therefore, no change is made to the testing processes. 


4 Design 


In this section, we present the design of CRETE with a VM as the concrete exe- 
cution environment. The reason for selecting a VM is that it allows complete 
access to the whole system for tracing runtime execution states and is generally 
accessible as mature open-source projects. 


4.1 CRETE Architecture 


As shown in Fig.1, CRETE has four key components: CRETE Runner, a tiny 
helper program executing in the guest OS of the VM, which parses the configu- 
ration file and launches the target binary program (TBP) with the configuration 
and test cases; CRETE Tracer, a comprehensive tracing plug-in in the vM, which 
captures binary-level traces from the concrete execution of the TBP in the VM; 
CRETE Replayer, an extension of the SEE, which enables the SEE to perform 
concolic execution on the captured traces and to generate test cases; CRETE 
Manager, a coordinator that integrates the VM and SEE, which manages run- 
time traces captured and test cases generated, coordinates the concrete and 
symbolic execution in the VM and the SEE, and iteratively explores the TBP. 
CRETE takes a TBP and a configuration file as inputs, and outputs generated 
test cases along with a report of detected bugs. The manual effort and learning 
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curve to utilize CRETE are minimal. It makes virtually no difference for users to 
setup the testing environment for the TBP in a CRETE instrumented VM than a 
vanilla vM. The configuration file is an interface for users to configure parameters 
on testing a TBP, especially specifying the number and size of symbolic command- 
line inputs and symbolic files for test case generation. 


4.2 Standardized Runtime Trace 


To enable the modular and plug-and-play design of CRETE, a standardized 
binary-level runtime trace format is needed. A trace in this format must capture 
sufficient information from the concrete execution, so the trace can be faithfully 
replayed within the SEE. In order to integrate a concrete execution environment 
to the CRETE framework, only a plug-in for the environment needs to be devel- 
oped, so that the concrete execution trace can be stored in the standard file 
format. Similarly, in order to integrate a SEE into CRETE, the engine only needs 
to be adapted to consume trace files in that format. 

We define the standardized runtime trace format based on the LLVM assembly 
language [31]. The reasons for selecting the LLVM instruction sets are: (1) it has 
become a de-facto standard for compiler design and program analysis [25,32]; 
(2) there have been many program analysis tools based on LLVM assembly lan- 
guage [5,33-35]. A standardized binary-level runtime trace is packed as a self- 
contained LLVM module that is directly consumable by a LLVM interpreter. It 
is composed of (1) a set of assembly-level basic blocks in the format of LLVM 
functions (2) a set of hardware states in the format of LLVM global variables 
(3) a set of CRETE-defined helper functions in LLVM assembly (4) a main func- 
tion in LLVM assembly. The set of assembly-level basic blocks is captured from 
a concrete execution of a TBP. It is normally translated from another format 
(such as QEMU-IR) into LLVM assembly, and each basic block is packed as a LLVM 
function. The set of hardware states are runtime states along the execution of 
the TBP. It consist of CPU states, memory states and maybe states of other 
hardware components, which are packed as LLVM global variables. The set of 
helper functions are provided by CRETE to correlate captured hardware states 
with captured basic blocks, and open interface to SEE. The main function rep- 
resents the concrete execution path of the TBP. It contains a sequence of calls 
to captured basic blocks (LLVM functions), and calls to CRETE-defined helper 
functions with appropriate hardware states (LLVM global variables). 

An example of a standardized runtime trace of CRETE is listed in Fig. 2. The 
first column of this figure is a complete execution path of a program with given 
concrete inputs. It is in the format of assembly-level pseudo-code. Assuming the 
basic blocks BB_1 and BB_3 are of interest and are captured by CRETE Tracer, 
while other basic blocks are not (see Sect. 4.3 for details). As shown in the second 
and third column of the figure, hardware states are captured in two categories, 
initial state and side-effects from basic blocks not being captured. As shown 
in the forth column of the figure, captured basic blocks are packed as LLVM 
functions, and captured hardware states are packed as LLVM global variables in 
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Concrete Execution Path Initial HW State |HW Side Effects Standardized Trace as a LLVM Module 
CPU Memory | CPU Memory @init_state = {<r0,rl,...,rn>, <[0x1234]>} 
TA @side_effects = {<r1>,<[0x5678]>} 
r0,r1,..., rn 
1 |mem_ld r1, [0x1234] [0x1234] define asm BB_1() { 
= %1 = load * 0x1234 
2 |add r1, r0 BB 1 getelementptr @init_state, r0_offset 
3 |mem st [0x1234], rl a i = 
= %4 = add %1, %3 
4 |Br rl, inst_5, xxx %5 = getelementptr @init_state, rl_offset 
mem ld r1, [0x5678 rf SOL e eis ceo 
DREMEL Bie AD ] store %4, * 0x1234 
6 add ri, r0 BB 2 r4 br %4, %.path_true, %.path_false 
. 7 %.path_true: 
7 |mem st [0x5678], r1 [0x5678] è path false: 
8 Jump inst_9 } ;asm_BB_3() is omitted 
9 |mem_ld r0, [0x1234] [0x1234] external sync_state() ;crete helper function 
10 |add rl, r0 
£ BB_3 define main() { 
11 |mem_st [0x5678], r1 call sync_state (@init_state) 
12 |Br r0, xxx, inst_13 call asm_BB_1() 
ig call sync_state(@side_effects) 
nop call asm_BB 3() 
BB_4 kiii 


14 nop } 


Fig. 2. Example of standardized runtime trace 


the standardized trace. A main function is also added making the trace a self- 
contained LLVM module. The main function first invokes CRETE helper functions 
to initialize hardware states, then it calls into the first basic block LLVM function. 
Before it calls into the second basic block LLVM function, the main function 
invokes CRETE helper functions to update hardware states. For example, before 
calling asm_BB_3, it calls function sync_state to update register r1 and memory 
location 0x5678, which are the side effects brought by BB_2. 


4.3 Selective Binary-Level Tracing 


A major part of a standardized trace is assembly-level basic blocks which are 
essentially binary-level instruction sequences representing a concrete execution 
of a TBP. It is challenging and also unnecessary to capture the complete execution 
of a TBP. First, software binaries can be very complex. If we capture the complete 
execution, the trace file can be prohibitively large and difficult for the SEE to 
consume and analyze. Second, as the TBP is executing, it is very common to 
invoke many runtime libraries (such as libc) of no interest to testers. Therefore, 
an automated way of selecting the code of interest is needed. 

CRETE utilizes Dynamic Taint Analysis (DTA) [36] to achieve selective trac- 
ing. The DTA algorithm is a part of CRETE Tracer. It tracks the propagation of 
tainted values, normally specified by users, during the execution of a program. 
It works on binary-level and in byte-wise granularity. By utilizing the DTA algo- 
rithm, CRETE Tracer only captures basic blocks that operate on tainted values, 
while only capturing side-effects from other basic blocks. For the example trace 
in Fig. 2, if the tainted value is from user’s input to the program and is stored 
at memory location 0x1234, DTA captures basic block BB_1 and BB_3, because 
both of them operate on tainted values, while the other two basic blocks do not 
touch tainted values, and are not captured by DTA. 
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CRETE Tracer captures the initial state of CPU by capturing a copy of the 
CPU state before the first interested basic block is executed. The initial CPU 
state is normally a set of register values. As shown in Fig.2, the initial CPU 
state is captured before instruction (1). Naively, the initial memory state can 
be captured in the same way; however, the typical size of memory makes it 
impractical to dump entirely. To minimize the trace size, CRETE Tracer only 
captures the parts of memory that are accessed by the captured read instructions, 
like instruction (1) and (9). The memory being touched by the captured write 
instructions, like instruction (3) and (11), can be ignored because the state of this 
part of the memory has been included in the write instructions and has been 
captured. As a result, CRETE Tracer monitors every memory read instruction 
that is of interest, capturing memory as needed on-the-fly. In the example above, 
there are two memory read instructions. CRETE Tracer monitors both of them, 
but only keeps the memory state taken from instruction (1) as a part of the 
initial state of memory, because instruction (1) and (9) access the same address. 

The side effects of hardware states are captured by monitoring uncaptured 
write instructions of hardware states. In the example in Fig. 2, instructions (5) 
and (6) write CPU registers which cause side effects to the CPU state. CRETE 
Tracer monitors those instructions and keeps the updated register values as part 
of the runtime trace. As register r1 is updated twice by two instructions, only 
the last update is kept in the runtime trace. Similarly, CRETE Tracer captures 
the side effect of memory at address 0x5678 by monitoring instruction (7). 


4.4 Concolic Test Case Generation 


While a standardized trace is a self-contained LLVM module and can be directly 
executed by a LLVM interpreter, it opens interfaces to SEE to inject symbolic 
values for test case generation. Normally SEE injects symbolic values by making 
a variable in source code symbolic. From source code level to machine code level, 
references of variables by names have become memory accesses by addresses. For 
instance, a reference of a concrete input variable of a program becomes a access 
of a piece of memory that stores the state of that input variable. CRETE injects 
self-defined helper function, crete_make_concolic, to the captured basic blocks 
while capturing trace. This helper function provides the address and size of the 
piece of memory for injecting symbolic values, along with a name to offer better 
readability for test case generation. By catching this helper function, SEE can 
introduce symbolic values at the right time and right place. 

A standardized trace in CRETE represents only a single path of a TBP as shown 
in Fig. 3(a). Test case generation on this trace with naive symbolic execution 
by SEE won’t be effective, as it ignores the single path nature of the trace. As 
illustrated in Fig. 3(b), native symbolic replay of CRETE trace produces execution 
states and test cases that are exponential to the number of branches within the 
trace. As shown in Fig.3(c), with concolic replay of CRETE trace, the SEE in 
CRETE maintains only one execution state, requiring minimal memory usage, 
and generates a more compact set of test cases, whose number is linear to the 
number of branches in that trace. For a branch instruction in a captured basic 
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Fig. 3. Execution tree of the example trace from Fig. 2: (a) for concrete execution, (b) 
for symbolic execution, and (c) for concolic execution. 


block, if both of the paths are feasible given the collected constraints so far on 
the symbolic values, the SEE in CRETE only keeps the execution state of the 
path that was taken by the original concrete execution in the vM by adding the 
corresponding constraints of this branch instruction, while generating a test case 
for the other path by resolving constraints with the negated branch condition. 
This generated test case can lead the TBP to a different execution path later 
during the concrete execution in the VM. 


4.5 Bug and Runtime Vulnerability Detection 


CRETE detects bugs and runtime vulnerabilities in two ways. First, all the native 
checks embedded in SEE are checked during the symbolic replay over the trace 
captured from concrete execution. If there is a violation to a check, a bug report 
is generated and associated with the test case that is used in the vM to generate 
this trace. Second, since CRETE does not change the native testing process and 
simply provides additional test cases that can be applied in the native process, 
all the bugs and vulnerability checks that are used in the native process are 
effective in detecting bugs and vulnerabilities that can be triggered by the CRETE 
generated test cases. For instance, Valgrind [26] can be utilized to detect memory 
related bugs and vulnerabilities along the paths explored by CRETE test cases. 


5 Implementation 


To demonstrate the practicality of CRETE, we have implemented its complete 
workflow with QEMU [23] as the frontend and KLEE [5] as the backend respec- 
tively. And to demonstrate the extensibility of CRETE, we have also developed 
the tracing plug-in for the 8051 emulator which readily replaces QEMU. 


CRETE Tracer for QEMU: To give CRETE the best potential of supporting vari- 
ous guest platforms supported by QEMU, CRETE Tracer captures the basic blocks 
in the format of QEMU-IR. To convert captured basic blocks into standardized 
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trace format, we implemented a QEMU-IR to LLVM translator based on the x86- 
LLVM translator of $7E [37]. We offload this translation from the runtime tracing 
as a separate offline process to reduce the runtime overhead of CRETE Tracer. 
QEMU maintains its own virtual states to emulate physical hardware state of a 
guest platform. For example, it utilizes virtual memory state and virtual CPU 
state to emulate states of physical memory and CPU. Those virtual states of 
QEMU are essentially source-level structs. CRETE Tracer captures hardware states 
by monitoring the runtime values of those structs maintained by QEMU. QEMU 
emulates the hardware operations by manipulating those virtual states through 
corresponding helper functions defined in QEMU. CRETE Tracer captures the side 
effects on those virtual hardware states by monitoring the invocation of those 
helper functions. As a result, the initial hardware states being captured are the 
runtime values of these QEMU structs, and the side effects being captured are 
the side effects on those structs from the uncaptured instructions. 


CRETE Replayer for KLEE: KLEE takes as input the LLVM modules compiled 
from C source code. As the CRETE trace is a self-contained LLVM module, CRETE 
Replayer mainly injects symbolic values and achieves concolic test generation. 
To inject symbolic values, CRETE Replayer provides a special function handler 
for CRETE interface function crete_make_concolic. KLEE is an online symbolic 
executor natively, which forks execution states on each feasible branches and 
explores all execution paths by maintaining multiple execution states simulta- 
neously. To achieve concolic test generation, CRETE Replayer extends KLEE to 
generate test cases only for feasible branches while not forking states. 


CRETE Tracer for 8051 Emulator: The 8051 emulator executes a 8051 binary 
directly by interpreting its instructions sequentially. For each type of instruction, 
the emulator provides a helper function. Interpreting an instruction entails call- 
ing this function to compute and change the relevant registers and memory 
states. The tracing plug-in for the 8051 emulator extends the interpreter. When 
the interpreter executes an instruction, an LLVM call to its corresponding helper 
function is put in the runtime trace. The 8051 instruction-processing helper func- 
tions are compiled into LLVM and incorporated into the runtime trace serving as 
the helper functions that map the captured instructions to the captured runtime 
states. The initial runtime state is captured from the 8051 emulator before the 
first instruction is executed. The resulting trace is of the same format as that 
from QEMU and is readily consumable by KLEE. 


6 Evaluation 


In this section, we present the evaluation results of CRETE from its application 
to GNU CoREUTILS [38] and TianoCore utility programs for UEFI BIOS [39]. 
Those evaluations demonstrate that CRETE generates effective test cases that 
are as effective in achieving high code coverage as the state-of-the-art tools for 
automated test case generation, and can detect serious deeply embedded bugs. 
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6.1 GNU CoOREUTILS 


Experiment Setup. GNU COREUTILS is a package of utilities widely used 
in Unix-like systems. The 87 programs from COREUTILS (version 6.10) contain 
20, 559 lines of code, 988 functions, 14,450 branches according to lcov [40]. The 
program size ranges from 18 to 1, 475 in lines, from 2 to 120 in functions, and from 
6 to 1,272 in branches. It is an often-used benchmark for evaluating automated 
program analysis systems, including KLEE, MergePoint and others [5, 12,41]. This 
is why we chose it as the benchmark to compare with KLEE and ANGR. 

CRETE and ANGR generates test cases from program binaries without debug 
information, while KLEE requires program source code. To measure and compare 
the effectiveness of test cases generated from different systems, we rerun those 
tests on the binaries compiled with coverage flag and calculate the code cover- 
age with lcov. Note that we only calculate the coverage of the code in GNU 
COREUTILS itself, and do not compute code coverage of the library code. 

We adopted the configuration parameters for those programs from KLEE’s 
experiment instructions!. As specified in the instructions, we ran KLEE on each 
program for one hour with a memory limit of 1GB. We increased the memory 
limit to 8GB for the experiment on ANGR, while using the same timeout of 
one hour. CRETE utilizes a different timeout strategy, which is defined by no 
new instructions being covered in a given time-bound. We set the timeout for 
CRETE as 15min in this experiment. This timeout strategy was also used by 
DASE [41] for its evaluation on COREUTILS. We conduct our experiments on an 
Intel Core i7-3770 3.40 GHz CPU desktop with 16GB memory running 64-bit 
Ubuntu 14.04.5. We built KLEE from its release v1.3.0 with LLVM 3.4, which was 
released on November 30, 2016. We built ANGR from its mainstream on Github 
at revision e7df250, which was committed on October 11, 2017. CRETE uses 
Ubuntu 12.04.5 as the guest OS for its VM front-end in our experiments. 


Table 1. Comparison of overall and median coverage by KLEE, ANGR, and CRETE on 
COREUTILS. 


Cov. Line (%) Function (% Branch (%) 
KLEE | ANGR | CRETE | KLEE | ANGR| CRETE | KLEE | ANGR | CRETE 


Overall | 70.48 | 66.79 | 74.32 | 78.54 79.05 | 83.00 | 58.23 | 54.26 | 63.18 


Median | 88.09 | 81.62 | 86.60 | 100 100 | 100 79.31 | 70.59 | 77.57 


Comparison with KLEE and ANGR. As shown in Table1, our experiments 
demonstrate that CRETE achieves comparable test coverage to KLEE and gen- 
erally outperforms ANGR. The major advantage of KLEE over CRETE is that it 
works on source code with all semantics information available. When the pro- 
gram size is small, symbolic execution is capable of exploring all feasible paths 


1 http://klee.github.io/docs/coreutils-experiments/. 


292 B. Chen et al. 


Table 2. Distribution comparison of coverage achieved by KLEE, ANGR, and CRETE on 
COREUTILS. 


Cov. Line Function Branch 


KLEE | ANGR | CRETE | KLEE | ANGR | CRETE | KLEE | ANGR | CRETE 
90-100% | 40 24 33 65 60 65 15 16 19 


80-90% | 15 22 25 12 8 10 27 12 17 
70-80% | 13 14 10 3 7 5 14 16 25 
60-70% | 9 12 10 2 4 3 9 15 6 
50-60% | 5 1 4 1 8 11 9 
40-50% 1 1 1 2 8 7 6 
0-40% 4 3 3 1 6 10 5 


with given resources, such as time and memory. This is why KLEE can achieve 
great code coverage, such as line coverage over 90%, on more programs than 
CRETE, as shown in Table 2. KLEE requires to maintain execution states for all 
paths being explored at once. This limitation becomes bigger when size of pro- 
gram gets bigger. What’s more, KLEE analyzes programs within its own virtual 
environment with simplified model of real execution environment. Those models 
sometimes offer advantages to KLEE by reducing the complexity of the TBP, while 
sometimes they lead to disadvantages by introducing inaccurate environment. 
This is why CRETE gradually caught up in general as shown in Table 2. Specif- 
ically, CRETE gets higher line coverage on 33 programs, lower on 31 programs, 
and the same on other 23 programs. Figure 4(a) shows the coverage differences 
of CRETE over KLEE on all 87 COREUTILS programs. Note that our coverage 
results for KLEE are different from KLEE’s paper. As discussed and reported 
in previous works [12,41], the coverage differences are mainly due to the major 
code changes of KLEE, an architecture change from 32-bit to 64-bit, and whether 
manual system call failures are introduced. 

ANGR shares the same limitation as KLEE requiring to maintain multiple 
states and provide models for execution environment, while it shares the disad- 
vantage of CRETE in having no access to semantics information. Moreover, ANGR 
provides models of environment at machine level supporting various platforms, 
which is more challenging compared with KLEE’s model. What’s more, we found 
and reported several crashes of ANGR from this evaluation, which also affects the 
result of ANGR. This is why ANGR performs worse than both KLEE and CRETE in 
this experiment. Figure 4(b) shows the coverage differences of CRETE over ANGR 
on all 87 COREUTILS programs. While CRETE outperformed ANGR on majority 
of the programs, there is one program printf that ANGR achieved over 40% 
better line coverage than CRETE, as shown in the left most column in Fig. 4(b). 
We found the reason is printf uses many string routines from libc to parse 
inputs and ANGR provides effective models for those string routines. Similarly, 
KLEE works much better on printf than CRETE. 
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Fig. 4. Line coverage difference on COREUTILS by CRETE over KLEE and ANGR: positive 
values mean CRETE is better, and negative values mean CRETE is worse. 


Coverage Improvement over Seed Test Case. Since CRETE is a concolic 
testing framework, it needs an initial seed test case to start the test of a TBP. 
The goal of this experiment is to show that CRETE can significantly increase the 
coverage achieved by the seed test case that the user provides. To demonstrate 
the effectiveness of CRETE, we set the non-file argument, the content of the input 
file and the stdin to zeros as the seed test case. Of course, well-crafted test cases 
from the users would be more meaningful and effective to serve as the initial test 
cases. Figure 5 shows the coverage improvement of each program. On average, 
the initial seed test case covers 17.61% of lines, 29.55% of functions, and 11.11% 
of branches. CRETE improves the line coverage by 56.71%, function coverage 
by 53.44%, and branch coverage by 52.14% respectively. The overall coverage 
improvement on all 87 COREUTILS programs is significant. 
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Fig. 5. Coverage improvement over seed test case by CRETE on GNU COREUTILS 


Bug Detection. In our experiment on COREUTILS, CRETE was able to detect 
all three bugs on mkdir, mkfifo, and mknod that were detected by KLEE. This 
demonstrates that CRETE does not sacrifice bug detection capacity while working 
directly on binaries without debug and high-level semantic information. 
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6.2 TianoCore Utilities 


Experiment Setup. TianoCore utility programs are part of the open-source 
project EDK2 [42], a cross-platform firmware development environment from 
Intel. It includes 16 command-line programs used to build BIOS images. The 
TianoCore utility programs we evaluated are from its mainstream on Github 
at revision 75ce7ef committed on April 19, 2017. According to lcov, the 16 
TianoCore utility programs contain 8, 086 lines of code, 209 functions, and 4, 404 
branches. Note that we only calculate the coverage of the code for TianoCore 
utility programs themselves, and do not compute the coverage of libraries. 

The configuration parameters we used on those utility programs are based on 
our rough high-level understanding of these programs from their user manuals. 
We assigned each program a long argument of 16 Bytes, and four short arguments 
of 2 Bytes, along with a file of 10 Kilobytes. We conduct our experiments on the 
same platform with the same host and guest OS as we did for the COREUTILS 
evaluation, and set the timeout also as 15 min for each program. 


High Coverage Test Generation From Scratch. For all the arguments 
and file contents in the parameter configuration, we set their initial value as 
binary zeros to serve as the seed test case of CRETE. Figure 6 shows that CRETE 
delivered high code coverage, above 80% line coverage, on 9 out of 16 programs. 
On average, the initial seed test case covers 14.56% of lines, 28.71% of functions, 
and 12.38% of branches. CRETE improves the line coverage by 43.61%, function 
coverage by 41.63%, and branch coverage by 44.63% respectively. Some programs 
got lower coverage because of: (1) inadequate configuration parameters; (2) error 
handling code triggered only by failed system calls; (3) symbolic indices for arrays 
and files not well handled by CRETE. 
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Fig. 6. Coverage improvement over seed test case by CRETE on TianoCore utilities 


Coverage (%) 


Bug Detection. To further demonstrate CRETE’s capability in detecting deeply 
embedded bugs, we performed a set of evaluations focusing on concolic file with 
CRETE on TianoCore utility programs. From the build process of a tutorial 
image, OvmfPkg, from EDK2, we extracted 509 invocations to TianoCore util- 
ity programs and the corresponding intermediate files generated, among which 
37 unique invocations cover 6 different programs. By taking parameter configu- 
rations from those 37 invocations and using their files as seed files, we ran CRETE 
with a timeout of 2h on each setup, in which only files are made symbolic. 
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Table 3. Classified crashes found by CRETE on Tianocore utilities: 84 unique crashes 
from 8 programs 


Crash type Count | Severity Crashed programs 

Stack corruption 1 High (Exploitable) VfrCompile 

Heap error 6 High (Exploitable GenFw 

Write access violation | 23 High (Exploitable) EfiLdrImage, GenFw, 
EfiRom, GenFfs 

Abort signal 2 Medium (Signs of exploitable) | GenFw 

Read access violation | 45 Low (May not exploitable) GenSec, GenFw, Split, 
GenCrc32, VfrCompile 

Other access violation | 7 Mixed GenFw 


Combining experiments on concolic arguments and concolic files, CRETE 
found 84 distinct crashes (by stack hash) from eight TianoCore utility programs. 
We used a GDB extension [43] to classify the crashes, which is a popular way 
of classifying crashes for AFL users [44]. Table3 shows that CRETE found vari- 
ous kinds of crashes including many exploitable ones, such as stack corruption, 
heap error, and write access violation. There are 8 crashes that are found with 
concolic arguments while the other 76 crashes are found with concolic files. We 
reported all those crashes to the TianoCore development team. So far, most of 
the crashes have been confirmed as real bugs, and ten of them have been fixed. 

We now elaborate on a few sample crashes to demonstrate that the bugs 
found by CRETE are significant. VirCompile crashed with a segmentation fault 
due to stack corruption when the input file name is malformed, e.g., '\\.%*a' as 
generated by CRETE. This bug is essentially a format string exploit. VirCompile 
uses function vsprintf() to compose a new string from a format string and 
store it in a local array with a fixed size. When the format string is malicious, 
like '%*a', function vsprintf() will keep reading from the stack and the local 
buffer will be overflowed, hence causing a stack corruption. Note that CRETE 
generated a well-formed prefix for the input, '\\.', which is required to pass a 
preprocessing check from VfrCompile, so that the malicious format string can 
attack the vulnerable code. 

CRETE also exposed several heap errors on GenFw by generating malformed 
input files. GenFw is used to generate a firmware image from an input file. The 
input file needs to follow a very precise file format, because GenFw checks the 
signature bytes to decide the input file type, uses complex nested structs to parse 
different sections of the file, and conducts many checks to ensure the input file 
is well-formed. Starting from a seed file of 223 Kilobyte extracted from EDK2’s 
build process, CRETE automatically mutated 29 bytes in the file header. The 
mutated bytes introduced a particular combination of file signature and sizes and 
offsets of different sections of the file. This combination passed all checks on file 
format, and directed GenFw to a vulnerable function which mistakenly replaces 
the buffer already allocated for storing the input file with a much smaller buffer. 
Follow-up accesses of this malformed buffer caused overflow and heap corruption. 
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7 Conclusions and Future Work 


In this paper, we have presented CRETE, a versatile binary-level concolic testing 
framework, which is designed to have an open and highly extensible architecture 
allowing easy integration of concrete execution frontends and symbolic execution 
backends. At the core of this architecture is a standardized format for binary- 
level execution traces, which is LLvM-based, self-contained, and composable. 
Standardized execution traces are captured by concrete execution frontends, 
providing succinct and sufficient information for symbolic execution backends to 
reproduce the concrete executions. We have implemented CRETE with KLEE as 
the symbolic execution engine and multiple concrete execution frontends such 
as QEMU and 8051 Emulator. The evaluation of COREUTILS programs shows 
that CRETE achieved comparable code coverage as KLEE directly analyzing the 
source code of COREUTILS and generally outperformed ANGR. The evaluation of 
TianoCore utility programs found numerous exploitable bugs. 

We are assembling a suite of 8051 binaries for evaluating CRETE and will 
report the results in the near future. Also as future work, we will develop new 
CRETE tracing plugins, e.g., for concrete execution on physical machines based 
on PIN. With these new plugins, we will focus on synthesizing abstract system- 
level traces from trace segments captured from binaries executing on various 
platforms. Another technical challenge that we plan to address is how to handle 
symbolic indices for arrays and files, so code coverage can be further improved. 
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Abstract. Variational systems allow effective building of many custom 
variants by using features (configuration options) to mark the variable 
functionality. In many of the applications, their quality assurance and 
formal verification are of paramount importance. Family-based model 
checking allows simultaneous verification of all variants of a variational 
system in a single run by exploiting the commonalities between the vari- 
ants. Yet, its computational cost still greatly depends on the number of 
variants (often huge). 

In this work, we show how to achieve efficient family-based model 
checking of CTL* temporal properties using variability abstractions and 
off-the-shelf (single-system) tools. We use variability abstractions for 
deriving abstract family-based model checking, where the variability 
model of a variational system is replaced with an abstract (smaller) 
version of it, called modal featured transition system, which preserves 
the satisfaction of both universal and existential temporal properties, 
as expressible in CTL*. Modal featured transition systems contain 
two kinds of transitions, termed may and must transitions, which are 
defined by the conservative (over-approximating) abstractions and their 
dual (under-approximating) abstractions, respectively. The variability 
abstractions can be combined with different partitionings of the set of 
variants to infer suitable divide-and-conquer verification plans for the 
variational system. We illustrate the practicality of this approach for 
several variational systems. 


1 Introduction 


Variational systems appear in many application areas and for many reasons. 
Efficient methods to achieve customization, such as Software Product Line Engi- 
neering (SPLE) [8], use features (configuration options) to control presence and 
absence of the variable functionality [1]. Family members, called variants of a 
variational system, are specified in terms of features selected for that particular 
variant. The reuse of code common to multiple variants is maximized. The SPLE 
method is particularly popular in the embedded and critical system domain 
(e.g. cars, phones). In these domains, a rigorous verification and analysis is very 
important. Among the methods included in current practices, model checking [2] 
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is a well-studied technique used to establish that temporal logic properties hold 
for a system. 

Variability and SPLE are major enablers, but also a source of complexity. 
Obviously, the size of the configuration space (number of variants) is the lim- 
iting factor to the feasibility of any verification technique. Exponentially many 
variants can be derived from few configuration options. This problem is referred 
to as the configuration space explosion problem. A simple “brute-force” appli- 
cation of a single-system model checker to each variant is infeasible for realistic 
variational systems, due to the sheer number of variants. This is very ineffective 
also because the same execution behavior is checked multiple times, whenever it 
is shared by some variants. Another, more efficient, verification technique [5,6] 
is based on using compact representations for modelling variational systems, 
which incorporate the commonality within the family. We will call these repre- 
sentations variability models (or featured transition systems). Each behavior in 
a variability model is associated with the set of variants able to produce it. A 
specialized family-based model checking algorithm executed on such a model, 
checks an execution behavior only once regardless of how many variants include 
it. These algorithms model check all variants simultaneously in a single run and 
pinpoint the variants that violate properties. Unfortunately, their performance 
still heavily depends on the size and complexity of the configuration space of 
the analyzed variational system. Moreover, maintaining specialized family-based 
tools is also an expensive task. 

In order to address these challenges, we propose to use standard, single- 
system model checkers with an alternative, externalized way to combat the con- 
figuration space explosion. We apply the so-called variability abstractions to a 
variability model which is too large to handle (“configuration space explosion” ), 
producing a more abstract model, which is smaller than the original one. We 
abstract from certain aspects of the configuration space, so that many of the con- 
figurations (variants) become indistinguishable and can be collapsed into a single 
abstract configuration. The abstract model is constructed in such a way that if 
some property holds for this abstract model it will also hold for the concrete 
model. Our technique extends the scope of existing over-approximating vari- 
ability abstractions [14,19] which currently support the verification of universal 
properties only (LTL and VCTL). Here we construct abstract variability models 
which can be used to check arbitrary formulae of CTL’, thus including arbitrary 
nested path quantifiers. We use modal featured transition systems (MFTSs) for 
representing abstract variability models. MFTSs are featured transition systems 
(FTSs) with two kinds of transitions, must and may, expressing behaviours that 
necessarily occur (must) or possibly occur (may). We use the standard conser- 
vative (over-approximating) abstractions to define may transitions, and their 
dual (under-approximating) abstractions to define must transitions. Therefore, 
MFTSs perform both over- and under-approximation, admitting both univer- 
sal and existential properties to be deduced. Since MFTSs preserve all CTL* 
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properties, we can verify any such properties on the concrete variability model 
(which is given as an FTSs) by verifying these on an abstract MFTS. Any 
model checking problem on modal transitions systems (resp., MFTSs) can be 
reduced to two traditional model checking problems on standard transition sys- 
tems (resp., FTSs). The overall technique relies on partitioning and abstracting 
concrete FTSs, until the point we obtain models with so limited variability (or, 
no variability) that it is feasible to complete their model checking in the brute- 
force fashion using the standard single-system model checkers. Compared to the 
family-based model checking, experiments show that the proposed technique 
achieves performance gains. 


2 Background 


In this section, we present the background used in later developments. 


Modal Featured Transition Systems. Let F = {Aj,..., An} be a finite set 
of Boolean variables representing the features available in a variational system. 
A specific subset of features, k C F, known as configuration, specifies a variant 
(valid product) of a variational system. We assume that only a subset K C 2” of 
configurations are valid. An alternative representation of configurations is based 
upon propositional formulae. Each configuration k € K can be represented by a 
formula: k(A1) A... A k(An), where k(A;) = A; if A; € k, and k(A;) = 7A; if 
A; ¢ k for 1 <i <n. We will use both representations interchangeably. 

We recall the basic definition of a transition system (TS) and a modal tran- 
sition system (MTS) that we will use to describe behaviors of single-systems. 


Definition 1. A transition system (TS) is a tuple T = (S, Act, trans, I, AP, L), 
where S is a set of states; Act is a set of actions; trans C Sx Act x S is a transi- 
tion relation; I C S is a set of initial states; AP is a set of atomic propositions; 
and L: S — 24? is a labelling function specifying which propositions hold in a 
state. We write eae whenever (81, A, S2) € trans. 


An execution (behaviour) of a TS T is an infinite sequence p = 80151,2... 


with so € J such that s; Aia Si+1 for alli > 0. The semantics of the TS 7, 
denoted as [T] 7s, is the set of its executions. 

MTSs [26] are a generalization of transition systems that allows describ- 
ing not just a sum of all behaviors of a system but also an over- and under- 
approximation of the system’s behaviors. An MTS is a TS equipped with two 
transition relations: must and may. The former (must) is used to specify the 
required behavior, while the latter (may) to specify the allowed behavior of a 
system. 


Definition 2. A modal transition system (MTS) is represented by a tuple M = 
(S, Act, trans™%, trans™*t, I, AP, L), where trans™*Y CS x Act x S describe 
may transitions of M; trans™*t C § x Act x S describe must transitions of M, 
such that trans™”*t C trans ™™”, 
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The intuition behind the inclusion trans”! C trans™®Y is that transitions 
that are necessarily true (trans™”5t) are also possibly true (trans™®”). A may- 
execution in M is an execution with all its transitions in trans™®”; whereas a 
must-execution in M is an execution with all its transitions in trans™”st, We 
use [M] irrg to denote the set of all may-executions in M, whereas [M] 7455 
to denote the set of all must-executions in M. 

An FTS describes behavior of a whole family of systems in a superimposed 
manner. This means that it combines models of many variants in a single mono- 
lithic description, where the transitions are guarded by a presence condition that 
identifies the variants they belong to. The presence conditions w are drawn from 
the set of feature expressions, FeatExp(F), which are propositional logic formulae 
over F: 4y ::= true | A € F | ay | Yı Aye. The presence condition w of a transition 
specifies the variants in which the transition is enabled. We write [y] to denote 
the set of variants from K that satisfy y, i.e. k € [y] iff k H 4. 


Definition 3. A featured transition system (FTS) represents a tuple F = 
(S, Act, trans, I, AP, L,F,K, ô), where S, Act, trans, I, AP, and L are defined as 
in TS; F is the set of available features; K is a set of valid configurations; and 
ô : trans > FeatExp(F) is a total function decorating transitions with presence 
conditions (feature expressions). 


The projection of an FTS F to a variant k € K, denoted as m(F), is the TS 
(S, Act, trans’, I, AP, L), where trans’ = {t € trans | k = 6(t)}. We lift the 
definition of projection to sets of configurations K’ C K, denoted as mg (F), by 
keeping the transitions admitted by at least one of the configurations in K’. That 
is, ng (F), is the FTS (S, Act, trans’, I, AP, L,F,K’,6), where trans’ = {t € 
trans | dk € K’.k H ô(t)}. The semantics of an FTS F, denoted as [F] rrs, 
is the union of behaviours of the projections on all valid variants k € K, i.e. 
[Flers = Urex[t(F)] rs. 

We will use modal featured transition systems (MFTS) for representing 
abstractions of FTSs. MFTSs are variability-aware extension of MTSs. 


Definition 4. A modal featured transition system (MFTS) represents a 
tuple MF = (S, Act, trans™”, trans™**, I, AP, L,F,K, 64, 5s"), where 
trans™Y and °% : trans™Y — FeatExp(F) describe may transitions of 
MF; trans™ 8! and 6™* : trans™*' — FeatExp(F) describe must transi- 
tions of MF. 


The projection of an MFTS MF to a variant k € K, denoted as m( MF), 
is the MTS (S, Act, trans’, trans’™**, I, AP, L), where trans'™’Y = {ft € 
trans™Y | kEd™4(t)}, trans’ = {tetrans™** | k -6™™st(t)}. We define 
[MF] Mers = Verl MP) Mrs: and [MF] iers = Vrek MP) MTs. 


Example 1. Throughout this paper, we will use a beverage vending machine as a 
running example [6]. Figure 1 shows the FTS of a VENDINGMACHINE family. It 
has five features, and each of them is assigned an identifying letter and a color. 
The features are: VendingMachine (denoted by letter v, in black), the mandatory 
base feature of purchasing a drink, present in all variants; Tea (t, in red), for 
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serving tea; Soda (s, in green), for serving soda, which is a mandatory feature 
present in all variants; CancelPurchase (c, in brown), for canceling a purchase 
after a coin is entered; and FreeDrinks (f, in blue) for offering free drinks. Each 
transition is labeled by an action followed by a feature expression. For instance, 
the transition @ Feen, @) is included in variants where the feature f is enabled. 
By combining various features, a number of variants of this VENDINGMA- 
CHINE can be obtained. Recall that v and s are mandatory features. The set 
of valid configurations is thus: KYM = {{v, s}, {v, s, t}, {v, s, ch, {v, s,t, ch, {v, 
s, f}, {u, 5, t, f}, {u, 5, ¢, f}, {v,s,t,c, f}}. In Fig. 2 is shown the basic version 
of VENDINGMACHINE that only serves soda, which is described by the con- 
figuration: {v,s} (or, as formula v As Ant Anc Af), that is the projection 
T{y,s}(WENDINGMACHINE). It takes a coin, returns change, serves soda, opens a 
compartment so that the customer can take the soda, before closing it again. 
Figure 3 shows an MTS. Must transitions are denoted by solid lines, while 
may transitions by dashed lines. 


CTL* Properties. Computation Tree Logic* (CTL*) [2] is an expressive tem- 
poral logic for specifying system properties, which subsumes both CTL and LTL 
logics. CTL* state formulae ® are generated by the following grammar: 


@::= true | a E€ AP | ~a | 81 A 82 | Yọ | FẸ, p= | ġ1 A Q2 | Od | GiU Q2 


where ¢ represent CTL* path formulae. Note that the CTL* state formulae ® 
are given in negation normal form (~ is applied only to atomic propositions). 
Given € CTL’, we consider =® to be the equivalent CTL* formula given in 
negation normal form. Other derived temporal operators (path formulae) can be 
defined as well by means of syntactic sugar, for instance: O¢ = true Ud (¢ holds 
eventually), and O¢ = =V0-7¢ (¢ always holds). VCTL* and SCTL* are subsets 
of CTL* where the only allowed path quantifiers are V and J, respectively. 

We formalise the semantics of CTL* over a TS 7. We write [T ]fs for the 
set of executions that start in state s; pļi] = s; to denote the i-th state of the 
execution p; and pi = §;Ai415;41-.. for the suffix of p starting from its i-th state. 


Definition 5. Satisfaction of a state formula ® in a state s of a TST, denoted 
T,s | ọ, is defined as (T is omitted when clear from context): 
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(1) sH a iffa € L(s); s| ~a iff a € L(s), 
(2) s H Bı AB iff s | Bı and s = Da, 
(3) s E V6 if Ve € [T]>s- P F $3 s H 6 if 3p € [T]7s- p H $ 


Satisfaction of a path formula ġ for an execution p of a TST, denoted T, p = 4, 
is defined as (T is omitted when clear from context): 


(4) 
(5) 


= ® iff o[0] H E, 
= 1 A do iff p | Qı and p E $2; p = Ob iff pı =| ¢; p E (1U b2) iff 
i>0. (pi = p2 A (WO<j<i-1.p; H ¢1)) 


WS D 


A TST satisfies a state formula ®, written T |= ®, iff Yso € I. so E B. 


Definition 6. An FTS F satisfies a CTL* formula ©, written F = Ð, iff all its 
valid variants satisfy the formula: Vk EK. mk(F) H ©. 


The interpretation of CTL* over an MTS M is defined slightly different from 
the above Definition 5. In particular, the clause (3) is replaced by: 


(3’) s — Vo iff for every may-execution p in the state s of M, that is Yp € 
IM] mra, it holds p — ¢; whereas s Jẹ iff there exists a must-execution 
p in the state s of M, that is 3p € [M] irg", such that p H ¢. 


From now on, we implicitly assume this adapted definition when interpreting 
CTL* formulae over MTSs and MFTSs. 


Example 2. Consider the FTS VENDINGMACHINE in Fig. 1. Suppose that the 
proposition start holds in the initial state @. An example property ®, is: 
VOVOstart, which states that in every state along every execution all possible 
continuations will eventually reach the initial state. This formula is in VCTL*. 
Note that VENDINGMACHINE | 4. For example, if the feature c (Cancel) is 
enabled, a counter-example where the state @ is never reached is: O > @) — 
OQ -@®-@®-.... The set of violating products is [c] = {{v, s, c}, {v, 5, t, c}, 
{v,s,c, f}, {v, s,t,c, f}} C KYM. However, T [cj (VENDINGMACHINE) F 9}. 

Consider the property 2: VA J®start, which describes a situation where 
in every state along every execution there exists a possible continuation that 
will eventually reach the start state. This is a CTL* formula, which is neither 
in VCTL* nor in SCTL*. Note that VENDINGMACHINE |= ®ə2, since even for 
variants with the feature c there is a continuation from the state ©) back to ©. 

Consider the JCTL* property 3: JO 4Qstart, which states that there exists 
an execution such that in every state along it there exists a possible continuation 
that will eventually reach the start state. The witnesses are © > © > © > 
©®-@®-@®—@®... for variants that satisfy ac, and O > O - O> @ 
@—-@®—@... for variants with c. 
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3 Abstraction of FTSs 


We now introduce the variability abstractions which preserve full CTL and its 
universal and existential properties. They simplify the configuration space of 
an FTSs, by reducing the number of configurations and manipulating presence 
conditions of transitions. We start working with Galois connections! between 
Boolean complete lattices of feature expressions, and then induce a notion of 
abstraction of FTSs. We define two classes of abstractions. We use the standard 
conservative abstractions [14,15] as an instrument to eliminate variability from 
the FTS in an over-approximating way, so by adding more executions. We use 
the dual abstractions, which can also eliminate variability but through under- 
approximating the given FTS, so by dropping executions. 


Domains. The Boolean complete lattice of feature expressions (propositional 
formulae over F) is: (FeatExp(F) =, -, V, A, true, false, =). The elements of the 
domain FeatExp(F)/= are equivalence classes of propositional formulae ~ € 
FeatExp(F) obtained by quotienting by the semantic equivalence =. The order- 
ing } is the standard entailment between propositional logics formulae, whereas 
the least upper bound and the greatest lower bound are just logical disjunction 
and conjunction respectively. Finally, the constant false is the least, true is the 
greatest element, and negation is the complement operator. 


Conservative Abstractions. The join abstraction, a/°™, merges the control- 
flow of all variants, obtaining a single variant that includes all executions occur- 
ring in any variant. The information about which transitions are associated with 
which variants is lost. Each feature expression w is replaced with true if there 
exists at least one configuration from K that satisfies Y. The new abstract set of 
features is empty: a/°"(F) = 0, and the abstract set of valid configurations is a 
singleton: aJ™(K) = {true} if K 4 0. The abstraction and concretization func- 
tions between FeatExp(F) and FeatExp(@), forming a Galois connection [14,15], 
are defined as: 


true ifdkeKkEwW 
false otherwise 


true if w is true 
Vrek k ify is false 


as (4) z l R (ap) = 


$ ž fi . . . 
The feature ignore abstraction, a," °'*, introduces an over-approximation by 


ignoring a single feature ACF. It merges the control flow paths that only differ 
with regard to A, but keeps the precision with respect to control flow paths that 
do not depend on A. The features and configurations of the abstracted model are: 
afe (F) = F\{ A}, and af®®™ (K) = {k[l4 + true] | k € K}, where l4 denotes 
a literal of A (either A or =A), and k[l,4 + true] is a formula resulting from k by 


1 (L, <x) = (M, <m) is a Galois connection between complete lattices L (concrete 
domain) and M (abstract domain) iff a: L — M andy: M = L are total functions 
that satisfy: a(l) <m m <— > l <z y(m) for alll € L,m € M. Here <z and <m are 
the pre-order relations for L and M, respectively. We will often simply write (a, y) 
for any such Galois connection. 
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substituting true for la. The abstraction and concretization functions between 
FeatExp(F) and FeatEzp(af®™ (F)), forming a Galois connection [14,15], are: 


aA (p) = pla true] YAT) = (W A A) V (HAA) 
where w and y’ need to be in negation normal form before substitution. 
Dual Abstractions. Suppose that (FeatExp(F)/=,), (FeatExp(a(F)) =, =) are 
Boolean complete lattices, and (FeatExp(F) =, =) es (FeatExp(a(F)) =, =) is 
a Galois connection. We define [9]: & = ~o a o ~and y = ~ o y o ~= so that 


(FeatExp(F)/=,>) - (FeatExp(a(F));=,=) is a Galois connection (or equiva- 
lently, (FeatExp(a(F)) ;=, =) 1 (FeatExp(F)/=, }=)). The obtained Galois con- 


x 
nections (a, Y) are called dual (under-approximating) abstractions of (a, y). 


The dual join abstraction, ai, merges the control-flow of all variants, 
obtaining a single variant that includes only those executions that occur in all 
variants. Each feature expression w is replaced with true if all configurations from 
K satisfy w. The abstraction and concretization functions between FeatExp(F) 
and FeatEzrp(Q), forming a Galois connection, are defined as: ain = So Qin o~n 


and ryioin = 1.0 ion o5, that is: 


any) _ J true if Vk ee Ew FRI (ap) = Nre (K) y w J true 
false otherwise false if w is false 
The dual feature ignore abstraction, af introduces an under- 


approximation by ignoring the feature A € F, such that the literals of A 
(that is, A and ~A) are replaced with false in feature expressions (given in 
negation normal form). The abstraction and concretization functions between 
FeatExp(F) and FeatErp(a%®"**(F)), forming a Galois connection, are defined 


——_ 


fignore fignore 


. — fignore fignore 
as: Ot a = 7100, 


o~ and y4 eniyi on, that is: 


al (Y) = Yla = false] AW) = (WV A) A (WV A) 
where ~ and y’ are in negation normal form. 


Abstract MFTS and Preservation of CTL*. Given a Galois connection 
(a, y) defined on the level of feature expressions, we now define the abstrac- 
tion of an FTS as an MFTS with two transition relations: one (may) preserving 
universal properties, and the other (must) existential properties. The may tran- 
sitions describe the behaviour that is possible, but not need be realized in the 
variants of the family; whereas the must transitions describe behaviour that has 
to be present in any variant of the family. 


Definition 7. Given the FTS F = (S, Act, trans, I, AP, L,F,K, 64), we define the 
MFTSa(F) = (S, Act, trans™”, trans™**, T, AP, L, a(F), a(K), 6%, 6™™5*) to 
be its abstraction, where 6™*4(t) = a(d(t)), 6™™**(t) = a(d(t)), trans™Y = {t € 
trans | 64 (t) A false}, and trans™”t = {t € trans | 6™**(t) Æ false}. 
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Note that the degree of reduction is determined by the choice of abstraction and 
may hence be arbitrary large. In the extreme case of join abstraction, we obtain 
an abstract model with no variability in it, that is a/°"(F) is an ordinary MTS. 


Example 3. Recall the FTS VENDINGMACHINE of Fig. 1 with the set of valid 
configurations KYM (see Example 1). Figure 3 shows a?" (VENDINGMACHINE), 
where the allowed (may) part of the behavior includes the transitions that are 
associated with the optional features c, f, t in VENDINGMACHINE, whereas the 
required (must) part includes the transitions associated with the mandatory 
features v and s. Note that a/°"(VENDINGMACHINE) is an ordinary MTS with 
no variability. The MFTS f(y 8) (VENDINGMACHINE)) is shown in [12, 
Appendix B], see Fig. 8. It has the singleton set of features F = {c} and limited 
variability K = {c, =c}, where the mandatory features v and s are enabled. 


From the MFTS (resp., MTS) MF, we define two FTSs (resp., TSs) MF™* 
and MF™"** representing the may- and must-components of MF, i.e. its may 
and must transitions, respectively. Thus, we have [MF Jers = [MF] iors 
and [MF Jers = [MF] rs. 

We now show that the abstraction of an FTS is sound with respect to CTL’. 
First, we show two helper lemmas stating that: for any variant k € K that can 
execute a behavior, there exists an abstract variant k’ €a(K) that executes the 
same may-behaviour; and for any abstract variant k’ €a(K) that can execute a 
must-behavior, there exists a variant k€K that executes the same behaviour?. 


Lemma 1. Lety € FeatExp(F), and K be a set of valid configurations over F. 


af). 
= a), 


(i) Letk € K and k | w. Then there exists k' € a(K), such that k' 
(ii) Let k' € a(R) and k' | a(w). Then there exists k € K, such that k 


Lemma 2 


(i) Letk € K and p € [tx(F)] rs € [Fl ers. Then there exists k' € a(K), such 
that p € [ne (F) ars C le Pliers is a may-ezecution in a(F). 

(ii) Let k’ € a(R) and p € [re (a(F)) Ns C fa(F)] Tg be a must-execution 
in a(F). Then there exists k € K, such that p € [tk(F)] rs © [Fl ers. 


As a result, every VCTL* (resp., JCTL*) property true for the may- (resp., 
must-) component of a(F) is true for F as well. Moreover, the MFTS a(F) 
preserves the full CTL*. 


Theorem 1 (Preservation results). For any FTS F and (a, y), we have: 


(VCTL*) For every ® € VCTL*, a(F)™% = @ FHS. 
(ICTL*) For every ® € ACTL*, a(F)™*t H p FES. 
(CTL*) For every P E€ CTL*, a(F) H 8 FHS. 


2 Proofs of all lemmas and theorems in this section can be found in [12, Appendix A]. 
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Abstract models are designed to be conservative for the satisfaction of prop- 
erties. However, in case of the refutation of a property, a counter-example is 
found in the abstract model which may be spurious (introduced due to abstrac- 
tion) for some variants and genuine for the others. This can be established by 
checking which variants can execute the found counter-example. 
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Fig. 3. a)?" (VENDINGMACHINE). 


Let & be a CTL* formula which is not in VCTL* nor in SJCTL*, and let MF 
be an MFTS. We verify MF | @ by checking @ on two FTSs MF" and 
MF™"* and then we combine the obtained results as specified below. 


Theorem 2. For every P € CTL* and MFTS MF, we have: 


. May |__ must j 
Mres true if (MF EDA MF io 
false if (MF™Y 8 v MF™* p) 


Therefore, we can check a formula ® which is not in VCTL* nor in SCTL* on 
a(F) by running a model checker twice, once with the may-component of a(F) 
and once with the must-component of a(F). On the other hand, a formula & 
from VCTL* (resp., JCTL*) on a(F) is checked by running a model checker only 
once with the may-component (resp., must-component) of a(F). 

The family-based model checking problem can be reduced to a num- 
ber of smaller problems by partitioning the set of variants. Let the subsets 
Kı, K2,..., Kn form a partition of the set K. Then: F = © iff mx,(F) =| 8 
for all i = 1,...,n. By using Theorem 1 (CTL*), we obtain the following result. 


Corollary 1. Let Ki,Ko,...,K, form a partition of K, and (a1,71),..., 
(an,n) be Galois connections. If a1(m,(F)) = 2, ...,an(ng, (F)) H &, then 


Therefore, in case of suitable partitioning of K and the aggressive a” abstrac- 
tion, all a!°" (ng, (F))™ and a} (1K, (F))™"* are ordinary TSs, so the family- 
based model checking problem can be solved using existing single-system model 
checkers with all the optimizations that these tools may already implement. 
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Example 4. Consider the properties introduced in Example2. Using the TS 
a? (VENDINGMACHINE)’”*Y we can verify ®; = VOVOstart (Theorem1, 
(VCTL*)). We obtain the counter-example @ > @ > ©@ —~- @ — @..., which 
is genuine for variants satisfying c. Hence, variants from [|c] violate ®;. On the 
other hand, by verifying that a!°"(a,.j(VENDINGMACHINE))” satisfies $, 
we can conclude by Theorem 1, (VCTL*) that variants from [7c] satisfy 81. 

We can verify 62 = VO J®start by checking may- and must-components of 
aoin (VENDINGMACHINE). In particular, we have a?" (VENDINGMACHINE) 
= ð> and a? (VENDINGMACHINE)"“*! L 5. Thus, using Theorem 1, (CTL*) 
and Theorem 2, we have that VENDINGMACHINE } 2. 

Using aJ°™(VENDINGMACHINE)™"*! we can verify 6; = 30 Astart, by 
finding the witness O > @ > © — © i > © > @.... By Theorem1, 
(ACTL*), we have that VENDINGMACHINE | 3. 


4 Implementation 


We now describe an implementation of our abstraction-based approach for CTL 
model checking of variational systems in the context of the state-of-the-art 
NuSMV model checker [3]. Since it is difficult to use FTSs to directly model 
very large variational systems, we use a high-level modelling language, called 
fNuUSMV. Then, we show how to implement projection and variability abstrac- 
tions as syntactic transformations of fNUSMV models. 


A High-Level Modelling Language. {NUSMV is a feature-oriented extension 
of the input language of NUSMV, which was introduced by Plath and Ryan 
[28] and subsequently improved by Classen [4]. A NUSMV model consists of a 
set of variable declarations and a set of assignments. The variable declarations 
define the state space and the assignments define the transition relation of the 
finite state machine described by the given model. For each variable, there are 
assignments that define its initial value and its value in the next state, which 
is given as a function of the variable values in the present state. Modules can 
be used to encapsulate and factor out recurring submodels. Consider a basic 
NuSMV model shown in Fig. 4a. It consists of a single variable 2 which is 
initialized to 0 and does not change its value. The property (marked by the 
keyword SPEC) is “VO(a > k)”, where k is a meta-variable that can be replaced 
with various natural numbers. For this model, the property holds when k = 0. 
In all other cases (for k > 0), a counterexample is reported where x stays 0. 
The fNUSMV language [28] is based on superimposition. Features are mod- 
elled as self-contained textual units using a new FEATURE construct added to 
the NUSMV language. A feature describes the changes to be made to the given 
basic NUSMV model. It can introduce new variables into the system (in a section 
marked by the keyword INTRODUCE), override the definition of existing variables 
in the basic model and change the values of those variables when they are read (in 
a section marked by the keyword CHANGE). For example, Fig. 4b shows a FEATURE 
construct, called A, which changes the basic model in Fig. 4a. In particular, the 
feature A defines a new variable nA initialized to 0. The basic system is changed 
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in such a way that when the condition “nA = 0” holds then in the next state 
the basic system’s variable x is incremented by 1 and in this case (when x is 
incremented) nA is set to 1. Otherwise, the basic system is not changed. 
Classen [4] shows that fNUSMV and FTS are expressively equivalent. He 
[4] also proposes a way of composing fNUSMV features with the basic model 
to create a single model in pure NUSMV which describes all valid variants. 
The information about the variability and features in the composed model is 
recorded in the states. This is a slight deviation from the encoding in FTSs, 
where this information is part of the transition relation. However, this encoding 
has the advantage of being implementable in NUSMV without drastic changes. 
In the composed model each feature becomes a Boolean state variable, which is 
non-deterministically initialised and whose value never changes. Thus, the initial 
states of the composed model include all possible feature combinations. Every 
change performed by a feature is guarded by the corresponding feature variable. 
For example, the composition of the basic model and the feature A given 
in Figs.4a and b results in the model shown in Fig. 4c. First, a module, called 
features, containing all features (in this case, the single one A) is added to the 
system. To each feature (e.g. A) corresponds one variable in this module (e.g. 
fA). The main module contains a variable named f of type features, so that 
all feature variables can be referenced in it (e.g. f. fA). In the next state, the 
variable x is incremented by 1 when the feature A is enabled (fA is TRUE) and 
nA is 0. Otherwise (TRUE: can be read as else:), x is not changed. Also, nA is 
set to 1 when A is enabled and g is incremented by 1. The property VO(x > 0) 
holds for both variants when A is enabled and A is disabled (fA is FALSE). 


| MODULE main 


2: VAR f: Ue; MODULE features 
> ASSIGN VAR fA: boolean; 
| init(x) := 0; ASSIGN 
9 next (a) := 2; init(fA) := { TRUE, FALSE}; 

(a) The basic model. 6 MODULE main 

VAR f : features; x: 0..1; nA: 0..1; 

| FEATURE A 8 ASSIGN 
) INTRODUCE 9 init(x) := 0;init(nA) := 0; 

VAR nA: 0..1; LO next(xz) := case f. fA&nA=0: 241; 
| ASSIGN init(nA) := 0; 11 TRUE : z; 
5 CHANGE 12 easc; 
6 IF (nA = 0) THEN 13 next(nA) := case 

IMPOSE next(z) := x + l; 14 f.fA&nA=0 &next(r)=xr+1: 1; 
8 next(nA) := 15 TRUE : nA; 
9 next(xr)=xr+1?1:nA; 16 easc; 

(b) The feature A. (c) The composed model M. 


Fig. 4. NUSMV models. 
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Transformations. We present the syntactic transformations of fNUSMV mod- 
els defined by projection and variability abstractions. Let M represent a model 
obtained by composing a basic model with a set of features F. Let M contain 
a set of assignments of the form: s(v) := case bı : e1;...bn : €n; esac, where 
v is a variable, b; is a boolean expression, e; is an expression (for 1 < i < n), 
and s(v) is one of v, init(v), or next(v). We denote by [M] the FTS for this 
model [4]. 

Let K’ C 2¥ be a set of configurations described by a feature expression 1’, 
ie. [y] = K’. The projection miy} ([M]) is obtained by adding the constraint 
w' to each b; in the assignments to the state variables. 

Let (a, y) be a Galois connection from Sect.3. The abstract a(M)™* and 
a(M)™"s* are obtained by the following rewrites for assignments in M: 


a(s(v) := case bı :€1;.. . bn: €n; esac) ”?” = 
must 


s(v):=case a™ (b1) :e€1;... Q™ (bn) en; esac 
a(s(v):=case bi :€1;.. bn :€n; esac) =s ) 


v):=case Q(b1):e1;...@(bn):enj esac 


The functions a” and @ copy all basic boolean expressions other than fea- 
ture expressions, and recursively calls itself for all sub-expressions of compound 
expressions. For aJ°™(M)'™¥, we have a single Boolean variable rnd which 
is non-deterministically initialized. Then, a’(w) = rnd if a(w) = true. We 
have: a([M])™Y = [a(M)™] and a([M])™* = fa(M)™*]. For exam- 
ple, given the composed model M in Fig. 4c, the abstractions aJ?™(M)™Y and 
ajn ( M)must are shown in Figs. 5 and 6, respectively. Note that ajein(f. f A) = 
false, so the first branch of case statements in M is never taken in al?™(M)™us, 


MODULE main 
2 VAR x: 0..1; nA: 0..1; rnd : boolean; 


3 ASSIGN 
l init(x) := 0; init(nA) := 0; 

init(rnd) := { TRUE, FALSE}; | MODULE main 

next(x) := casernd&nA=0:2+4+1; 2 VAR «:0..1; nA: 0..1; 

TRUE : x; easc; 3 ASSIGN 
8 next(nA) := case l init(x) := 0; init(nA) := 0; 
9 rnd&nA=0k&next(xr)=x+1: 1; 5 next(x) := 2; 
TRUE : nA; easc; 6 next(nA) := nA; 
Fig. 5. al ™ (Myr Fig. 6. al ™ (Myr 


5 Evaluation 


We now evaluate our abstraction-based verification technique. First, we show 
how variability abstractions can turn a previously infeasible analysis of variabil- 
ity model into a feasible one. Second, we show that instead of verifying CTL 
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properties using the family-based version of NUSMV [7], we can use variabil- 
ity abstraction to obtain an abstract variability model (with a low number of 
variants) that can be subsequently model checked using the standard version of 
NuSMV. 

All experiments were executed on a 64-bit Intel® Core? ™ i7-4600U CPU run- 
ning at 2.10 GHz with 8GB memory. The implementation, benchmarks, and all 
results obtained from our experiments are available from: https://aleksdimovski. 
github.io/abstract-ctl.html. For each experiment, we report the time needed to 
perform the verification task in seconds. The BDD model checker NUSMV is 
run with the parameter -df -dynamic, which ensures that the BDD package 
reorders the variables during verification in case the BDD size grows beyond a 
certain threshold. 


Synthetic Example. As an experiment, we have tested limits of family-based 
model checking with extended NUSMV and “brute-force” single-system model 
checking with standard NUSMV (where all variants are verified one by one). 
We have gradually added variability to the variational model in Fig. 4. This was 
done by adding optional features which increase the basic model’s variable x 
by the number corresponding to the given feature. For example, the CHANGE 
section for the second feature B is: IF (nB = 0) THEN IMPOSE next(x) := x + 
2; next(nB) := next(x) = x + 2?1:nB, and the domain of z is 0..3. 

We check the assertion V(x > 0). For |F| = 25 (for which |K| = 27° variants, 
and the state space is 23?) the family-based NUSMV takes around 77min to 
verify the assertion, whereas for |F| = 26 it has not finished the task within two 
hours. The analysis time to check the assertion using “brute force” with standard 
NuSMV ascends to almost three years for |F| = 25. On the other hand, if we 
apply the variability abstraction a/°™, we are able to verify the same assertion by 
only one call to standard NUSMV on the abstracted model in 2.54s for |F| = 25 
and in 2.99s for |F| = 26. 


Elevator. The ELEVATOR, designed by Plath and Ryan [28], contains about 
300 LOC and 9 independent features: Antiprunk, Empty, Exec, OpenIfIdle, 
Overload, Park, QuickClose, Shuttle, and TTFull, thus yielding 29 = 512 
variants. The elevator serves a number of floors (which is five in our case) such 
that there is a single platform button on each floor which calls the elevator. 
The elevator will always serve all requests in its current direction before it stops 
and changes direction. When serving a floor, the elevator door opens and closes 
again. The size of the ELEVATOR model is 27° states. On the other hand, the 
sizes of a'i" (ELEVATOR) and aJ°™(ELEVATOR)™™*? are 27° and 21° states, 
resp. 

We consider five properties. The VCTL property “ıı = VO (floor = 
2A liftBut5.pressed ^ direction = up = V{direction = upUfloor = 5)” 
is that, when the elevator is on the second floor with direction up and 
the button five is pressed, then the elevator will go up until the fifth floor 
is reached. This property is violated by variants for which Overload (the 
elevator will refuse to close its doors when it is overloaded) is satisfied. 
Given sufficient knowledge of the system and the property, we can tailor 
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prop- ||family-based app.||abstraction-based app.| improvement 
-erty || |K| TIME |a(K)| TIME TIME 

®ı 512 36.73 s 2 2.59 s 14 x 

P2 512 35.89 s 2 6.95 s 5x 

P3 512 54.76 s 1 1.67 s 32 x 

Py 512 2.65 s 2 1.04 s 2.5 X 

Ps 512 37.76 s 2 2.62 s 15 x 


Fig. 7. Verification of ELEVATOR properties using tailored abstractions. We compare 
family-based approach vs. abstraction-based approach. 


an abstraction for verifying this property more effectively. We call standard 
NuSMV to check ı on two models a!" (7[gver10aaj(ELEVATOR))“Y and 
a!°™ (771 over1oaa] (ELEVATOR))"Y. For the first abstracted projection we obtain 
an “abstract” counter-example violating ®,, whereas the second abstracted pro- 
jection satisfies 1. Similarly, we can verify that the VCTL property “2 = 
VO (floor = 2 A direction = up > V © (direction = up))” is satisfied only by 
variants with enabled Shuttle (the lift will change direction at the first and 
last floor). We can successfully verify ®2 for a! (ryEgnuttre](ELEVATOR))™Y 
and obtain a counter-example for a!°" (7 shuttie] (ELEVATOR))'""“Y. The ICTL 
property “z = (OpenIfIdle A —QuickClose) = = 40(SU (door = open))” 
is that, there exists an execution such that from some state on the door 
stays open. We can invoke the standard NUSMV to verify that 3 holds for 
ac} (T1opent#Ta1eA—QuickClose] (ELEVATOR))"*’. The following two properties are 
neither in VCTL nor in ICTL. The property “P4 = VO (floor = 1Aidle door = 
closed = > AU (floor = 1 ^ door = closed))” is that, for any execution globally 
if the elevator is on the first floor, idle, and its door is closed, then there is a con- 
tinuation where the elevator stays on the first floor with closed door. The satis- 
faction of ®4 can be established by verifying it against both a/°™ (ELEVATOR) 2 
and aJ°™(ELEVATOR)’“** using two calls to standard NUSMV. The property 
“Ps = Park = > VO (floor =1Aidle = > AlidleUfloor = 1])” is satisfied 
by all variants with enabled Park (when idle, the elevator returns to the first 
floor). We can successfully verify ®; by analyzing a!° (7pparkgj (ELEVATOR) )’"Y 
and ai? (TIpark] (ELEVATOR))"""** using two calls to standard NUSMV. We can 
see in Fig. 7 that abstractions achieve significant speed-ups between 2.5 and 32 
times faster than the family-based approach. 


6 Related Work and Conclusion 


Recently, many family-based techniques that work on the level of variational sys- 
tems have been proposed. This includes family-based syntax checking [20,25], 
family-based type checking [24], family-based static program analysis [16, 17,27], 
family-based verification [22,23,29], etc. In the context of family-based model 
checking, Classen et al. present FTSs [6] and specifically designed family-based 
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model checking algorithms for verifying FTSs against LTL [5]. This approach 
is extended [4,7] to enable verification of CTL properties using an family-based 
version of NUSMV. In order to make this family-based approach more scalable, 
the works [15,21] propose applying conservative variability abstractions on FTSs 
for deriving abstract family-based model checking of LTL. An automatic abstrac- 
tion refinement procedure for family-based model checking is then proposed in 
[19]. The application of variability abstractions for verifying real-time variational 
systems is described in [18]. The work [11,13] presents an approach for family- 
based software model checking of #ifdef-based (second-order) program families 
using symbolic game semantics models [10]. 

To conclude, we have proposed conservative (over-approximating) and their 
dual (under-approximating) variability abstractions to derive abstract family- 
based model checking that preserves the full CTL*. The evaluation confirms 
that interesting properties can be efficiently verified in this way. In this work, we 
assume that a suitable abstraction is manually generated before verification. If 
we want to make the whole verification procedure automatic, we need to develop 
an abstraction and refinement framework for CTL* properties similar to the one 
in [19] which is designed for LTL. 
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Abstract. Feature-oriented software development (FOSD) is a promis- 
ing approach for developing a collection of similar software products 
from a shared set of software assets. A well-recognized issue in FOSD is 
the analysis of feature interactions: cases where the integration of multi- 
ple features would alter the behavior of one or several of them. Existing 
approaches to feature interaction detection require a fixed order in which 
the features are to be composed but do not provide guidance as to how 
to define this order or how to determine a relative order of a newly- 
developed feature w.r.t. existing ones. In this paper, we argue that clas- 
sic feature non-commutativity analysis, i.e., determining when an order 
of composition of features affects properties of interest, can be used to 
complement feature interaction detection to help build orders between 
features and determine many interactions. To this end, we develop and 
evaluate Mr. Feature Potato Head (FPH) — a modular approach to non- 
commutativity analysis that does not rely on temporal properties and 
applies to systems expressed in Java. Our experiments running FPH on 
29 examples show its efficiency and effectiveness. 


1 Introduction 


Feature-oriented software development (FOSD) [3] is a promising approach for 
developing a collection of similar software products from a shared set of software 
assets. In this approach, each feature encapsulates a certain unit of functionality 
of a product; features are developed and tested independently and then inte- 
grated with each other; developed features are then combined in a prescribed 
manner to produce the desired set of products. A well-recognized issue in FOSD 
is that it is prone to creating feature interactions [2,13,22,28]: cases where inte- 
grating multiple features alters the behavior of one or several of them. Not all 
interactions are desirable. E.g., the Night Shift feature of the recent iPhone did 
not allow the Battery Saver to be enabled (and the interaction was not fixed 
for over 2 months, potentially affecting millions of iPhone users). More critically, 
in 2010, Toyota had to recall hundreds of thousands of Prius cars due to an 
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interaction between the regenerative braking system and the hydraulic braking 
system that caused 62 crashes and 12 injuries. 

Existing approaches for identifying feature interactions either require an 
explicit order in which the features are to be composed [6,8, 18, 19,26] or assume 
presence of a “150%” representation which uses an implicit feature order [12,15]. 
Yet they do not provide guidance on how to define this order, or how to deter- 
mine a relative order of a newly-developed feature w.r.t. existing ones. 

A classical approach of feature non-commutativity detection, defined by 
Plath and Ryan [25], can be used to help build a composition order. The authors 
defined non-commutativity as “the presence of a property, the value of which is 
different depending on the order of the composition of the features” and pro- 
posed a model-checking approach allowing to check available properties on dif- 
ferent composition orders. E.g., consider the Elevator System [14,25] consisting 
of five features: Empty — to clear the cabin buttons when the elevator is empty; 
ExecutiveFloor — to override the value of the variable stop to give priority to the 
executive floor (not stopping in the middle); TwoThirdsFull — to override the 
value of stop not allowing people to get into the elevator when it is two-thirds 
full; Overloaded — to disallow closing of the elevator doors while it is overloaded; 
and Weight — to allow the elevator to calculate the weight of the people inside 
the cabin. Features TwoThirdsFull and ExecutiveFloor are not commutative 
(e.g., a property “the elevator does not stop at other floors when there is a 
call from the executive floor” changes value under different composition orders), 
whereby Empty and Weight are. Thus, an order between Empty and Weight is 
not required, whereas the user needs to determine which of TwoThirdsFull or 
ExecutiveFloor should get priority. Thus, feature non-commutativity guarantees 
a feature interaction, whereas feature commutativity means that order of compo- 
sition does not matter. Both of these outcomes can effectively complement other 
feature interaction approaches. 

In this paper, we aim to make commutativity analysis practical and appli- 
cable to a broad range of modern feature-based systems, so that it can be used 
as “the first line of defense” before running other feature interaction detections. 
There are three main issues we need to tackle. First of all, to prove that fea- 
tures commute requires checking their composition against all properties, and 
capturing the complete behavior of features in the form of formal specifications 
is an infeasible task. Thus, we aim to make our approach property-independent. 
Second, we need to make commutativity analysis scalable and avoid rechecking 
the entire system every time a single feature is modified or a new one is added. 
Finally, we need to support analysis of systems expressed in modern program- 
ming languages such as Java. 

In [25], features execute “atomically” in a state-machine representation of the 
system, i.e., they make all state changes in one step. However, when systems are 
represented in conventional programming languages like Java, feature execution 
may take several steps; furthermore, such features are composed sequentially, 
using superimposition [5]. Examining properties defined by researchers studying 
such systems [6], we note that they do not refer to intermediate states within 
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the feature execution, but only to states before or after running the feature, 
effectively treating features as atomic. In this paper, we use this notion of atom- 
icity to formalize commutativity. The foundation of our technique is the separa- 
tion between feature behavior and feature composition and efficiently checking 
whether different feature compositions orders leave the system in the same inter- 
nal state. Otherwise, a property distinguishing between the orders can be found, 
and thus they do not commute. We call the technique and the accompanying 
tool Mr. Feature Potato Head (FPH), named after the kids’ toy which can be 
composed from interchangeable parts. 

In this paper, we show that FPH can perform commutativity analysis in 
an efficient and precise manner. It performs a modular checking of pairs of fea- 
tures [17], which makes the analysis very scalable: when a feature is modified, the 
analysis can focus only on the interactions related to that feature, without need- 
ing to consider the entire family. That is, once the initial analysis is completed, 
a partial order between the features of the given system can be created and used 
for detecting other types of interactions. Any feature added in the future will be 
checked against all other features for non-commutativity-related interactions to 
define its order among the rest of the features, but the existing order would not 
be affected. In this paper, we only focus on the non-commutativity analysis and 
consider interaction resolution as being out of scope. 


Contributions. This paper makes the following contributions: (1) It defines 
commutativity for features expressed in imperative programming languages and 
composed via superimposition. (2) It proposes a novel modular representation 
for features that distinguishes between feature composition and behavior. (3) 
It defines and implements a modular specification-free feature commutativity 
analysis that focuses on pairs of features rather than on complete products or 
product families. (4) It instantiates this analysis on features expressed in Java. 
(5) It shows that the implemented analysis is effective for detecting instances of 
non-commutativity as well as proving their absence. (6) It evaluates the efficiency 
and scalability of the approach. 

The rest of the paper is organized as follows. We provide the necessary back- 
ground, fix the notation and define the notion of commutativity in Sect. 2. In 
Sect. 3, we describe our iterative tool-supported methodology for detecting fea- 
ture non-commutativity for systems expressed in Java. We evaluate the effective- 
ness and scalability of our approach in Sect. 4, compare our approach to related 
work in Sect. 5 and conclude in Sect. 6+. 


2 Preliminaries 


In this section, we present the basic concepts and definitions and define the 
notion of commutativity used throughout this paper. 


1 The complete replication package including the tool binary, case studies used in 
our experiments and proofs of selected theorems is available at https://github.com/ 
FeaturePotatoHead/FPH. 
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7 private boolean stopRequestedInDirection (Direction 


1 package ElevatorSystem; dir, boolean respectFloorCalls , boolean 
2 public class Elevator { respectInLiftCalls ) { 

3 int executiveFloor = 4; 8 if ( isExecutiveFloorCalling ()) { ... 

4 public boolean isExecutiveFloor(int floorID)  {...} 9 else return original (dir, respectFloorCalls , 
5 public boolean isExecutiveFloorCalling ()  {...} respectInLiftCalls ) ; 

6 


private boolean stopRequestedAtCurrentFloor() {...} 10 } 


Fig. 1. Java code snippet of the feature ExecutiveFloor. 


Feature-Oriented Software Development (FOSD). In FOSD, products are 
specified by a set of features (configuration). A base system has no features. 
While defining the notion of a feature is an active research topic [11], in this 
paper we assume that a feature is “a structure that extends and modifies the 
structure of a given program in order to satisfy a stakeholder’s requirement, to 
implement a design decision and to offer a configuration option” [5]. 


Superimposition. Superimposition is a feature composition technique that 
composes software features by merging their corresponding substructures. Based 
on superimposition, Apel et al. [5] propose a composition technique where dif- 
ferent components are represented using a uniform and language independent- 
structure called a feature structure tree (FST). An FST is a tree T = 
((Terminal Node) | (Non Terminal Node) (Tree T)+), where + denotes “one or 
more”. A Non Terminal Node is a tuple (name, type) which represents a non-leaf 
element of T with the respective name and type. A Terminal Node is a tuple 
(name, type, body) which represents a leaf element of T. In addition to name and 
type, each Terminal Node has body that encapsulates the content of the element, 
i.e., the corresponding method implementation or field initializer. A feature is a 
tuple f = (name,T), where name is a string representing f’s name and T is an 
FST abstractly representing f. 

Each feature describes the modifications that need to be made to the base 
system, also represented by an FST, to enable the behavior of the feature. While 
FSTs are generally language-independent, in this paper we focus on features 
defined in a Java-based language. For example, consider the Java code snippet 
in Fig. 1, which shows the ExecutiveFloor. This feature makes one of the floors 
“an executive one”. If there is a call to or from this floor, it gets priority over any 
other call. This feature is written in Java using a special keyword original [5] 
(line 9). Under this composition, a call from the new method to every existing 
method with the same name is added, in order to preserve the original behavior. 
Without original, new methods replace existing ones. 

The feature ExecutiveFloor in Fig. 1 is represented by the tuple (executive, T), 
where T is the FST in Fig. 2.] ElevatorSystenl is a Non Terminal Node that rep- 


resents the ElevatorSystem package with the tuple (ElevatorSystem, 


package), and | stopRequestedInDirection| is a Terminal Node represented 


by (stopRequestedInDirection, method, body), where body is the content of 
the stopRequested-InDirection method in Fig.1 (lines 8-9). Another Non 
Terminal Node is Elevator, whereas executiveFloor, isExecutiveFloor, 
isExecutiveFloorCalling and stopRequestedAtCurrentFloor are Terminal. 
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package ------|ElevatorSystem 
. -{Elevator| 
class =t 
isExecutiveFloor f executiveFloor sess sfield 
isExecutiveFloorCalling stopRequestedAtCurrentFloor 


Eat 7 _| stopRequestedInDirection 
method °° 


Fig. 2. FST representation for the feature ExecutiveFloor. 


ElevatorSystem ElevatorSystem| ElevatorSystem 
Floor ‘more classes Evair 
limore methods __| Elevator + isExecutiveFloo! executiveFloor N 
and fields — = 


mor novoas SEINE] [eco 
as 


Fig. 3. Simplified composition of ExecutiveFloor and the base elevator system. 


For Java-specified features, Terminal Nodes represent methods, fields, import 
statements, modifier lists, as well as extends, implements and throws clauses 
whereas directories, files, packages and classes are represented by Non Terminals. 


Superimposition Process. Given two FSTs, starting from the root and pro- 
ceeding recursively to create a new FST, two nodes are composed when they 
share the same name and type and when their parent nodes have been com- 
posed. For Terminal Nodes which additionally have a body, if a Node A is com- 
posed with a Node B, the body of A is replaced by that of B unless the keyword 
original is present in the body of B. In this case, the body of A is replaced by 
that of B and the keyword is replaced by A’s body. Since the original keyword 
is not used for fields, the body of the initial field is always replaced by that of 
the new one. 

Figure3 gives an example of a composition of a simplified ExecutiveFloor 
feature with the elevator base system. Terminal Nodes that have been overridden 
by the feature are with dashed outline and new fields and methods added by the 
feature are shown as shaded nodes. For example, the method stopRequested, 
which is part of the base system, is overridden by the feature, whereas the field 
executiveFloor, which is only part of the feature, is added to the base system. 


Commutativity. We define commutativity w.r.t. properties observable before 
or after features finish their execution (as those in [6]). A state of the system 
after superimposing a feature is the valuation of each variable (or array, object, 
field, etc. [24]) of the base system and each variable (or array, etc.) introduced 
by the feature. We also add a new variable inBase which is true iff this state is 
not within a method overridden by any feature. In the rest of the paper, we refer 
to states where inBase is true as inBase states. A transition of the system is an 
execution of a statement, including method calls and return statements [24]. 


324 M. Chechik et al. 


Then we say that two features commute if they preserve valuation of prop- 
erties of the form G(inBase => ¢), where ¢ is a propositional formula defined 
over any system state variables. That is, they do not commute if there is at least 
one state of the base system which changes depending on the order in which the 
features are composed. For example, the property “the elevator does not stop at 
other floors when there is a call from the executive floor”, used in Sect. 1 to iden- 
tify non-commutativity between features TwoThirdsFull and ExecutiveFloor, is 
G(inBase = > 7(isExecutiveFloorCalling ^ stopped ^ floor#executiveFloor)). 


3 Methodology 


Our goal is to provide a scalable technique for sy seats’ Shared | No 
determining whether features commute by sph redures Location? 
establishing whether the two different com- [Yes 
position orders leave the system in the same en pratar, 
internal state. The workflow of FPH is shown -ssas a l 
below. The first step of FPH is to transform prs fedures ye 
each feature from an FST into an FPH repre- «KI Pahan 
sentation consisting of a set of fragments. The Eon 

base is transformed in the same way as the pno 
individual features. Each fragment is further Vv 


split into feature behavior and feature composition — see Sect.3.1. Afterwards, 
we check for non-compositionality. If there do not exist feature fragments that 
have shared location of composition, i.e., whose feature composition components 
are the same, then the features commute. Otherwise, check the pairs of feature 
fragments for behavior preservation, i.e., when the two features are composed in 
the same location, the previous behavior is still present and can be executed. If 
this check succeeds, we perform the shared variables check — see Sect. 3.2. 


3.1 Separating Feature Behavior and Composition 


We now formally define the FPH representation of features that separates the 
behavior of features and location of their composition and provide transforma- 
tion operators between the FPH and the FST representations. 


Definition 1. An FPH feature is a tuple (name, fragments), where name is the 
feature name and fragments is the list of feature fragments that comprise the 
feature. Let a feature f be given. A Feature Fragment fg is a tuple (fb, fc), where 
fb is a feature behavior defined in Definition 2 and fc is a feature composition 
defined in Definition 3. 


Definition 2. Feature Behavior fb of a feature fragment fg is a tuple 
(name, type,body, bp, vars), where name, type and body represent the name, 
type and content, respectively, of the element represented by fg. bp is a boolean 
value which is set to true if the feature preserves the original behavior, i.e., 
when the keyword original is present in the body and not within a conditional 
statement. vars is a list of variable names read or written within fg. 
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Definition 3. Feature composition fc of a feature fragment fg is represented by 
(location) which is the path leading to the terminal node represented by fg. 


The Separate operator (see Fig. 4a) transforms features from the FST to the 
FPH representation by creating a new fragment for each Terminal Node in the 
given FST. For the behavior component of the fragment, its name, type and body 
attributes come from the respective counterparts of the FST Terminal Node. 
The bp field is true if every path within body contains the keyword original; 
otherwise, it is false. For the composition component, the location field gets its 
value from the unique path to the Terminal Node from the root of the FST. vars 
are the parameters of the method and the fields that are used within it. 

E.g., consider creating the FPH representation for ExecutiveFloor feature in 
Fig. 2. Since there are five Terminal Nodes, five fragments will be created to rep- 
resent each node. In the fragment created for the | stopRequestedInDirection 


node, the information in fb about name, node and type is derived from the infor- 
mation stored in the node, fb = (stopRequestedInDirection, method, [body]), 
where body consists of lines 8-9 of Fig. 1. bp is false since the keyword original 
is within an if statement and vars consists only of the method parameters since 
the method does not use any global fields. After separating, the feature compo- 
sition is fc = ElevatorSystem.Elevator.stopRequested-InDirection. 

To transform features from FPH back to FST, we define the Join operator. 
It takes as input a list of feature fragments and returns an FST (see Fig. 4b). 


Input: FST Representation of F 
Output: Fragments of F {f = (fb, fc)} 
1 begin 
fragment list — [] 
forall Non Terminal Nodes n € FST do 
f = (fb, fc) — new Feature Fragment 
fc.location — n.name 
fb.type — n.type 
fb.body — n.body 
fb.vars — n.get-Variables() 
if fb. body contains original in every path then 


Input: Set of fragments of F {(fb, fc)} 
Output: FST representation of F 
1 begin 
forall (fb, fc) € F do 
t — new Terminal Node 
t.name — fb.name 
t.type — fb.type 
t.body — fb.body 


ee NY aH kD 


oe y aA Hw ke wD 


fb.bp + true forall node € F do 
10 else fb.bp < false if node ¢ FST then add node to FST 
u add f to fragment list else continue 
12 return fragment list 10 return FST 


(a) (b) 
Input: Fragments of F; and F; {fj = (fbj, fcj)} with j € {1, 2} 
Output: Yes if F} and Fz commute, No otherwise 
1 begin 
2 forall (fbı, fc1) € Fi, (fb2, fez) € F} do 
3 if (fc, = fc2) then 
4 if (fbı.bp = false V fb2.bp = false) then 
5 | return No 
6 if (fbı .vars N fbz.vars) + 0 then 
7 | return No 
8 return Yes 


(c) 


Fig. 4. Algorithms Separate, Join and CheckCommutativity. 
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It creates a new Terminal Node to be added to the FST for each feature fragment 
in the given feature. The name, type and body attributes of the node are filled 
using the corresponding fields in the feature behavior component of the fragment. 
Then, starting from the root node, for every node in the location path of the 
feature composition component, if the node does not exist in the FST, it is 
added; otherwise, the next node of the path is examined. The information about 
bp and vars is already contained in the body of the Terminal Node and is no 
longer considered as a separate field. E.g., joining the ExecutiveFloor feature 
that we previously separated yields the FST in Fig. 2, as expected. 


Theorem 1. Let n be the number of features in a system. For every feature F 
which can be represented as (fb, fc), Join and Separate are inverses of each 
other, i.e., Join(Separate(F')) = F and Separate(Join(fb,fc)) = (fb,fc). 


3.2 Compositional Analysis of Non-commutativity 


We now formally present the algorithm check commutativity, a sequence of 
increasingly more precise, and more expensive, static checks to perform non- 
commutativity analysis. These are called shared location, behavior preservation 
and shared variables — see Fig. 4c. Additionally, we prove soundness and correct- 
ness of the FPH methodology, i.e., that our checks guarantee feature commuta- 
tivity as defined in Sect. 2. 


Check Shared Location. The first check examines whether Fı and F> have 
any fragments that can be composed in the same location (line 3). Clearly, when 
Fı and F are applied in different places, e.g., they change different methods, 
inBase states are the same independently of their order of composition, and 
thus the features commute. Otherwise, more precise checks are required. E.g., 
ExecutiveFloor (see Fig.2) and Empty (see Fig.5a) do not share methods or 
fields and thus can be applied in either order. 


Theorem 2. If features F; and Fy are not activated in the same location, any 
inBase state resulting from first composing F; followed by F> (denoted Fi; Fz) is 
the same as for Fo; Fi. 


Check Behavior Preservation. Suppose one pair of feature fragments of Fi 
and Fy, say, fı and f2, can be composed in the same location. Then we examine 
whether the original behavior is preserved or overridden (indicated by the fb 


ElevatorSystem ElevatorSystem 
Elevator! Elevator} 
S leaveElevator enterElevator 
(a) (b) 


Fig. 5. Two features of the elevator system. 
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field of each fragment). If bp of fı and f2 is true, an additional check for shared 
variables is applied. Otherwise, i.e., when bp of either fı or fə is false, we report 
an interaction. Clearly, this check can introduce false positives because we do not 
look at the content of the methods but merely at the presence of the original 
keyword. E.g., two methods may happen to perform the exact same operation 
and yet not include the original keyword. In this case, we would falsely detect 
an interaction?. 


Check Shared Variables. If F} and F are activated at different places and 
both preserve the original behavior, commutativity of their composition depends 
on whether they have shared variables that can be both read and written. This 
check aims to detect that. E.g., both features Empty (see Fig. 5a) and Weight (see 
Fig. 5b) modify the leaveElevator method and preserve the original behavior. 
Since no variables between them are shared, the order of composition does not 
affect the execution of the resulting system. 

Extracting shared variable information requires not only identifying which 
variable is part of each feature behavior, but also running points-to analysis 
since aliasing is very common in Java. Moreover, a shared variable might not 
appear in the body of the affected method but instead in the body of a method 
called by it. Yet existing frameworks for implementing interprocedural points-to 
analyses [21] may not correctly identify all variables read and written within 
a method. Moreover, even if two features do write to the same location, this 
may not manifest a feature interaction. E.g., they may write the same value. For 
these reasons, our shared variables check may introduce false positives and false 
negatives. We evaluate its precision in Sect. 4. 


Theorem 3. Let features Fı and Fə activated at the same place and preserving 
the behavior of the base be given. If the variables read and written by each feature 
are correctly identified and independent of each other (F,.vars N Fy.vars = 0), 
then any inBase state resulting from composing F\; Fə is the same as that of 
composing Fs; Fy. 


When two features merely read the same variable, it does not present an inter- 
action problem. We handle this case in our implementation (see Sect. 4). 


Theorem 4 (Soundness). Given features Fı and Fz, if variables read and 
written by them are correctly identified, Algorithm in Fig.4c is sound: when it 
outputs Success, F, and F} commute. 


Complexity. Let |F| be the number of features in the system and let M be the 
largest number of fragments that each feature can have. For a pair of feature 
fragments, checking shared location and checking behavior preservation are both 
done in constant time, so the overall complexity of these steps is O((|F| x M)?). 
In the worst case, all features affect the same set of methods and thus the shared 
variables check should be run on all of them. Yet, all fragments in a feature are 
non-overlapping, and thus the number of these checks is at most |F|? x M. 


? But this does not happen often — see Sect. 4. 


328 M. Chechik et al. 


The time to perform a shared variable check, which we denote by SV, can vary 
depending on an implementation and can be as expensive as PSPACE-hard. 
Thus, the overall complexity of non-commutativity detection is O((|F| x M)? + 
SV x |F|? x M). 


4 Evaluation 


In this section, we present an experimental evaluation of FPH, aiming to answer 
the following research questions: (RQ1) How effective is FPH in performing 
non-commutativity analysis of feature-based systems? (RQ2) How accurate is 
FPH’s non-commutativity analysis? (RQ3) How efficient is FPH compared to 
state-of-the-art tools for performing non-commutativity analysis? (RQ4) How 
well does FPH scale as the number of fragments increases? 


Tool Support. We have implemented our methodology (Sect.3) as follows. The 
Separate process is implemented on top of FeatureHouse’s composition operator 
in Java. We use the parsing process that was provided in FeatureHouse [4] to 
separate features to the FPH representation and added about 200 LOC. 

The main process to check commutativity is implemented as a Python script 
in about 250 LOC. The first two parts of the commutativity check are directly 
implemented in the script. The third one, Check shared variables, requires con- 
sidering possible aliases of feature-based Java programs. For this check, we have 
implemented a Java program, FPH_varsAnalysis, that calls Soot [21] to build the 
call graph and analyze each reachable method. FPH_varsAnalysis is an interpro- 
cedural context insensitive points-to analysis that, given two feature fragments 
that superimpose the same method, checks whether a variable of the same type 
is written by at least one of them and read or written by the other. Since fea- 
ture fragments cannot be compiled by themselves (and thus Soot cannot be used 
on them), in order to do alias analysis, our program requires a representation 
that consists of the base system and all possible features. This representation 
is readily available for systems from the SPLVerifier repository since it uses a 
family-based approach to analysis. We generate a similar representation for all 
other systems used in our experiments. 


Models and Methods. We have applied FPH to 29 case studies written in 
Java. In the first five columns of Table 1, we summarize the information about 
these systems. The first six have been considered by SPLVerifier [6] — a tool 
for checking whether a software product line (SPL) satisfies its feature spec- 
ifications. SPLVerifier includes sample-based, product-based and family-based 
analyses and assumes that the order in which features should be composed is 
provided. The SPLVerifier examples came with specifications given by aspects 
woven at base system points, with an exception thrown if the state violates an 
expected property. The rest of our case studies are SPLs from the FeatureHouse 
repository [4]. 

We were unable to identify other techniques for analyzing feature commu- 
tativity of Java programs. Plath and Ryan [25] and Atlee et al. [8] compare 
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different composition orders but handle only state machines. SPLVerifier [6] rep- 
resents state of the art in verification of feature-based systems expressed in Java, 
but it is not designed to do non-commutativity analysis. In the absence of alter- 
native tools, we adapted SPLVerifier to the task of finding non-commutativity 
violations to be able to compare with FPH. 

We conducted two experiments to evaluate FPH and to answer our research 
questions. For the first, we ran SPLVerifier on the first six systems (all properties 
that came with them satisfied the pattern in Sect. 2 and thus were appropriate 
for commutativity detection) presented in Table 1 to identify non-commutativity 
interactions. Since SPLVerifier is designed to check products against a set of spec- 
ifications, we have to define what a commutativity check means in this context. 
For a pair of features, SPLVerifier would detect a commutativity violation if, 
upon composing these features in different orders, the provided property pro- 
duces different values. During this check, SPLVerifier considers composition of all 
other features of the system in all possible orders and thus can identify two-way, 
three-way, etc. feature interactions, if applicable. We measured the time taken 
by SPLVerifier and the number of interactions found. 

For the second experiment, we checked all 29 systems using FPH to identify 
non-commutativity interactions. We measured the number of feature pairs that 
required checking for shared variables, the time the analysis took and the preci- 
sion of FPH in finding interactions. We were unable to establish ground truth for 
non-commutativity analysis in cases where FPH required the shared variables 
check due to our tool’s reliance on Soot’s unsound call graph construction [7]. 
Thus, we measured precision of our analysis by manually analyzing the valid- 
ity of every interaction found by FPH. We also calculated SPLVerifier’s relative 
recall, i.e., the ratio of non-commutativity-related interactions detected by FPH 
that were also detected by SPLVerifier. We did not encounter any interactions 
that were detected by SPLVerifier but not by FPH. 

When the shared variables check is not necessary, our technique is sound. 
In such cases, if we inform the user that two features are commutative, they 
certainly are, and there is no need to define an order between them. As shown 
below, soundness was affected only for a small number of feature pairs. Moreover, 
advances in static analysis techniques may improve our results for those cases in 
the future. Our experiments were performed on a 2GB RAM Virtual machine 
within an Intel Core i5 machine dual-core at 1.3 GHz. 


Results. Columns 6-10 of Table 1 summarize results of our experiments, includ- 
ing, for the first six examples, SPLVerifier’s precision and (relative) recall. “SV 
pairs” capture the number of feature pairs for which the shared variables check 
was required. A dash in the precision columns means that the measurement was 
not meaningful since no interactions were detected. E.g., SPLVerifier does not 
detect any non-commutativity interactions for Email, and FPH does not find 
any non-commutativity interactions for EPL. FPH found a number of instances 
of non-commutativity such as the one between ExecutiveFloor and Two Thirds- 
Full in the Elevator System. Only one SV check was required (while checking 
Empty and Weight features). Without our technique, the user would need to 
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Table 1. Overview of case studies. 


System LOC |# Feat..|# Frag.|Description # Comm | SV | FPH SPLV SPLV 
Interactions|Pairs|Precision|Precision|Rel. Recall 

Elevator 799 5 19 [Our running example 1 1 1 1 1 

Email 938 8 55 |Email communication suite 3 9 1 - 0 

Minepump 425 6 10 [Water pump in mining operation 3 0 1 1 0.67 

GPL 2510 17 109 |Graph product line 2 38 0.1 - 0 

AJStats 15311} 19 128 [Statistics for AspectJ 26 136 1 - 0 

ZipMe 5479 12 229 |Zip compression library 5 0 1 0 

BerkeleyDB 64652) 98 2667 |Embedded database engine 198 1 1 

ChatSystem/Burke 614 7 51 |Network client and server 2 14 0.33 

ChatSystem/Dreiling| 938 5 78  |Network client and server 3 0 1 

ChatSystem/Becker | 651 6 42 |Network client and server 5 2 1 

ChatSystem/Weiss 931 9 23 |Network client and server 4 5 0.75 

ChatSystem/Schink | 873 6 50 [Network client and server 4 1 1 

ChatSystem/Rehn 862 6 58 |Network client and server 14 2 1 

ChatSystem/Thuem | 544 7 34 |Network client and server 1 2 1 

EPL 99 10 22 |Arithmetic expression evaluator 0 I - 

GameOfLife 1656 14 154 |Computer game 5 0 1 

Graph 467 4 26 |Graph library 0 6 - 

Notepad/Quark 1397| 11 106 |Text editor 20 21 1 

Notepad /Delaware 1654 5 122 [Text editor 10 0 1 

Notepad/Wellington | 1522 3 38 /Text editor 0 0 - 

Notepad /Svetoslav 1627 5 83 |Text editor 0 0 - 

Notepad/Wehrman |1716 4 83 |Text editor 6 6 iL 

Notepad/Guimbarda | 1586| 14 229 |Text editor 91 0 1 

Notepad/Robison 1404 9 90 |Text editor 0 0 - 

PKJab 4994 7 99 |Chat network client 2 0 1 

Raroscope 428 4 18 |Compression library 0 0 - 

Sudoku 1850 6 103 |Computer game 5 4 1 

TankWar 3184 19 213 |Computer game 71 27 0.97 

Violet 9789 87 912 [UML model editor 35 28 1 


provide order between the five features of the Elevator System, that is, specify 20 
(5 x 4) ordering constraints. FPH allows us to conclude that ExecutiveFloor and 
Two ThirdsFull do not commute, that Empty and Weight likely commute but this 
is not guaranteed, and that all other pairs of features do commute. Thus, only 
two feature pairs required further analysis by the user. 

The Minepump system did not require the shared variable check at all and 
thus FPH analysis for it is sound, and all three of the found interactions were 
manually confirmed to be “real” (thus, precision is 1). ChatSystem/Weiss has 
nine features which would imply needing to define the order between 72 (9 x 8) 
feature pairs. Four non-commutativity cases were found, all using the shared 
variables check, but only three were confirmed as “real” via a manual inspection 
(thus, precision is 0.75). We conclude that FPH is effective in discovering non- 
commutativity violations and proving their absence (RQ1). 

We now turn to studying the accuracy of FPH w.r.t. finding non- 
commutativity violations (RQ2). From Table 1, we observe that for the Elevator 
System, both FPH and SPLVerifier correctly detect a non-commutativity inter- 
action. For the Minepump system, SPLVerifier only finds two out of the three 
interactions found by FPH (relative recall = 0.67). For the Email system, AJS- 
tats, ZipMe, and GPL the specifications available in SPLVerifier do not allow 
detecting any of the non-commutativity interactions found by FPH (relative 
recall = 0). 

GPL was a problematic case for FPH, affecting its precision. The graph algo- 
rithms in this example take a set of vertices and create and maintain an internal 
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Fig.6. (a) Number of FPH_varsAnalysis calls per system; (b) Time spent by 
FPH_varsAnalysis per system; (c) Percentage of non-commutativity checks where BP 
or SV analyses were applied last. (Color figure online) 


data structure (e.g., to calculate the vertices involved in the shortest path or in 
a strongly connected component). With this data structure, our analysis found 
a number of possible shared variables and incorrectly deemed several features as 
non-commutative. E.g., the algorithms to find cycles or the shortest path between 
two nodes access the same set of vertices but change different fields and thus are 
commutative. One way of avoiding such false positives would be to implement 
field-sensitive alias analysis. While more precise, it will be significantly slower 
than our current shared variables analysis. 

For the remaining systems, either FPH’s reported interactions were “real”, 
or, in cases where it returned some false positives (ChatSystemBurke, ChatSys- 
temWeiss, and TankWar), it had to do with the precision of the alias analysis. 
Thus, given SPLVerifier’s set of properties, FPH always exhibited the same or 
better precision and recall than SPLVerifier. Moreover, for all but three of the 
remaining systems, FPH exhibited perfect precision. We thus conclude that FPH 
is very accurate (RQ2). 

We now turn to the efficiency of our analysis (RQ3). The time it took to 
separate features into behavior and composition was usually under 5s. The out- 
lier was BerkeleyDB, which took about a minute, due to the number of features 
and especially fragments (BerkeleyDB has 2667 fragments whereas Violet has 
912 and the other systems have at most 229). In general, the time taken by 
FPH’s commutativity check was highly influenced by the number of calls to 
FPH_varsAnalysis. Figure 6a shows the number of calls to FPH_varsAnalysis as 
the number of features increases. E.g., BerkeleyDB has 98 features and required 
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only one call to FPH_varsAnalysis, while AJStats has 19 features and required 
136 of these calls. More features does not necessarily imply needing more of 
these checks. E.g., Violet and BerkeleyDB required fewer checks than AJStats, 
TankWar, and GPL, and yet they have more features. 

Figure 6b shows the overall time spent by FPH_varAnalysis per system being 
analyzed. NotepadQuark and Violet took more time (resp., 1192 sec. and 1270 
sec.) than GPL (1084 sec.) since these systems have calls to Java GUI libraries 
(awt and swing), thus resulting in a larger call graph than for GPL. A simi- 
lar situation occurred during checking TankWar (1790 sec.) and AJStats (1418 
sec.). It took FPH under 200s in most cases and less than 35min in the worst 
case to analyze non-commutativity (see Fig.6b). FPH was efficient because 
FPH_varAnalysis was required for a relatively small fraction of pairs of fea- 
ture fragments. We plot this information in Fig. 6c. For each analyzed system, it 
shows the percentage of feature fragments for which behavior preservation (BP) 
or shared variables (SV) was the last check conducted by FPH (out of the pos- 
sible 100%). We omit the systems for which these checks were required for less 
than 1% of feature pairs. The figure shows that the calls to FPH_varsAnalysis 
(to compute SV, in blue) were not required for over 96% of feature pairs. 

To check for non-commutativity violations, SPLVerifier needs to check all 
possible products which is infeasible in practice. So we set the timeout to one 
hour during which SPLVerifier was able to check 110 products for Elevator, 57 for 
Email, 151 for Minepump, 3542 for GPL, 2278 for AJStats and 1269 for ZipMe. 
For each of these systems, a different check is required for every specification, 
thus the same product is checked more than once if more than one specification 
exists. Even though GPL, AJStats and ZipMe are larger systems with more fea- 
tures, they have fewer properties associated with them and therefore we were 
able to check more products within one hour. Thus, to answer RQ3, FPH was 
much more efficient than SPLVerifier in performing non-commutativity analysis. 
SPLVerifier was only able to analyze products containing the base system and 
at most three features before reaching a timeout. Moreover, FPH can guaran- 
tee commutativity, while SPLVerifier cannot because of it being based on the 
properties given. 

Our experiments also allow us to conclude that our technique is highly scal- 
able (RQ4). E.g., the percentage of calls to FPH_varsAnalysis is shown to be 
small and increases only slightly with increase in the number of fragments (see 
Fig. 6a and b). 


Threats to Validity. Our results may not generalize to other feature-based sys- 
tems expressed in Java. We believe we have mitigated this threat by running our 
tool on examples provided by FeatureHouse. They include a variety of systems 
of different sizes which we consider to be representative of typical Java feature- 
based systems. As mentioned earlier, our use of SPLVerifier was not as intended 
by its designers. We also had no ground truth when the shared variable check 
was required. For those few cases, we calculated SPLVerifier’s relative instead of 
actual recall. 
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5 Related Work 


In this section, we survey related work on modular feature definitions, feature 
interaction detection and commutativity-related feature interactions. 


Modular Feature Definitions. A number of approaches to modular feature 
definitions have been proposed. E.g., the composition language in [8] includes 
states in which the feature is to be composed (similar to our fg.location) and the 
feature behavior (similar to our fb.body). Other work [4,9,10] uses superimposi- 
tion of FSTs to obtain the composed system. In [14,25], new variables are added 
or existing ones are changed with particular kind of compositions (either execut- 
ing a new behavior when a particular variable is read, or adding a check before a 
particular variable is set). These approaches treat the feature behavior together 
with its composition specification. Instead, our approach automatically separates 
feature definition into the behavioral and the composition part, enabling a more 
scalable and efficient analysis. 


Feature Interaction Detection. Calder et al. [13] survey approaches for ana- 
lyzing feature interactions. Interactions occur because the behavior of one fea- 
ture is being affected by others, e.g., by adding non-deterministic choices that 
result in conflicting states, by adding infinite loops that affect termination, or 
by affecting some assertions that are satisfied by the feature on its own. Check- 
ing these properties as well as those discussed in more recent work [8,15,18, 19] 
requires building the entire SPL. Additionally, all these approaches consider state 
machine representations which are not available for most SPLs, and extracting 
them from code is non-trivial. SPLLift [12] is a family-based static analysis tool 
not directly intended to find interactions. Any change in a feature would require 
building the family-based representation again, whereas we conduct modular 
checks between features. Spek [26] is a product-based approach that analyzes 
whether the different products satisfy provided feature specifications. It does 
not check whether the features commute. 


Non-commutativity-Related Feature Interactions. [5,8] also looked at 
detecting non-commutativity-related feature interactions. [5] presents a feature 
algebra and shows why composition (by superimposition) is, in general, not com- 
mutative. [8] analyzes feature commutativity by checking for bisimulation, and 
the result of the composition is a state machine representing the product. Neither 
work reports on a tool or applies to systems expressed in Java. 


Aspect-Oriented Approaches. Storzer et al. [27] present a tool prototype 
for detecting precedence-related interactions in AspectJ. Technically, this app- 
roach is very similar to ours: it (a) detects which advice is activated at the same 
place; (b) checks whether the proceed keyword and exceptions are present; and 
(c) analyzes read and written variables. Yet, the focus is on aspects, and often 
many aspects are required to implement a single feature [23]. This implies that 
for m features with an average of n aspects each, the analysis in [27] needs to 


make O ((m . n)?) checks, while our approach requires O (m?) checks. There- 
fore, the approach in [27] might be significantly slower than FPH. [1] analyzes 
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interactions of aspects given by composition filters by checking for simulation 
among all the different orderings in which advice with shared joinpoints can 
be composed. As the number of advice with shared joinpoints increases, that 
approach considers every possible ordering, while we keep the analysis pair- 
wise. [16,20] define modular techniques to check properties of aspect-oriented 
systems. [16] uses assume-guarantee reasoning to verify and detect interactions 
even when aspects can be activated within other aspects. It does not require 
an order but does require specifications to detect whether a certain composition 
order would not satisfy these. [20] uses the explicit CTL model-checking algo- 
rithm to distribute global properties into local properties to be checked for each 
aspect. This yields a modular check. In addition to requiring specifications, this 
technique assumes AspectJ’s ordering of aspects. 


6 Conclusion and Future Work 


In this paper, we presented a compositional approach for checking non- 
commutativity of features in systems expressed in Java. The method is based on 
determining whether pairs of features can write to the same variables and thus 
the order in which features are composed to the base system may determine their 
valuation. The method is complementary to other feature interaction detection 
approaches such as [6,12] in that it helps build an order in which features are 
to be composed. When two features commute, they can be composed in any 
order. In addition, this method helps detect a number of feature interactions. 
The method is implemented in our framework FPH — Mr. Feature Potato Head. 
FPH does not require specifying properties of features and does not need to 
consider the entire set of software products every time a feature is modified. By 
performing an extensive empirical evaluation of FPH, we show that the approach 
is highly scalable and effective. In the future, we plan to further evaluate our 
technique, handle languages outside of Java and experiment with more precise 
methods for determining shared variables. 
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Abstract. Software product lines continuously undergo model transfor- 
mations, such as refactorings, refinements, and translations. In product 
line transformations, the dedicated management of variability can help 
to control complexity and to benefit maintenance and performance. How- 
ever, since no existing approach is geared for situations in which both 
the product line and the transformation specification are affected by 
variability, substantial maintenance and performance obstacles remain. 
In this paper, we introduce a methodology that addresses such multi- 
variability situations. We propose to manage variability in product 
lines and rule-based transformations consistently by using annotative 
variability mechanisms. We present a staged rule application technique 
for applying a variability-intensive transformation to a product line. This 
technique enables considerable performance benefits, as it avoids enu- 
merating products or rules upfront. We prove the correctness of our 
technique and show its ability to improve performance in a software 
engineering scenario. 


1 Introduction 


Software product line engineering [1] enables systematic reuse of software arti- 
facts through the explicit management of variability. Representing a software 
product line (SPL) in terms of functionality increments called features, and map- 
ping these features to development artifacts such as domain models and code 
allows to generate custom-tailored products on demand, by retrieving the corre- 
sponding artifacts for a given feature selection. Companies such as Bosch, Boe- 
ing, and Philips use SPLs to deliver tailor-made products to their customers [2]. 

Despite these benefits, a growing amount of variability leads to combinatorial 
explosions of the product space and, consequently, to severe challenges. Notably, 
this applies to software engineering tasks such as refactorings [3], refinements [4], 
and evolution steps [5], which, to support systematic management, are often 
expressed as model transformations. When applying a given model transforma- 
tion to a SPL, a key challenge is to avoid enumerating and considering all possible 
products individually. To this end, Salay et al. [6] have proposed an algorithm 
that “lifts” regular transformation rules to a whole product line. The algorithm 


© The Author(s) 2018 
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 337-355, 2018. 
https: //doi.org/10.1007/978-3-319-89363-1_19 


338 D. Striiber et al. 


transforms the SPL, represented as a variability-annotated domain model, in 
such way as if each product had been considered individually. 

Yet, in complex transformation scenarios as increasingly found in prac- 
tice [7], not only the considered models include variations: The transforma- 
tion system can contain variability as well, for example, due to desired optional 
behavior of rules, or for rule variants arising from the sheer complexity of the 
involved meta-models. While a number of works [8-10] support systematic reuse 
to improve maintainability, variability-based model transformation (VB) [11,12 
also aims to improve the performance when a transformation system with many 
similar rules is executed. To this end, these rules are represented as a single rule 
with variability annotations, called VB rule. During rule applications, a special 
VB rule application technique [13] saves redundant effort by considering com- 
mon rule parts only once. In summary, for cases where either the model or the 
transformation system alone contains variability, solid approaches are available. 

However, a more challenging case occurs when a variability-intensive trans- 
formation is applied to an SPL. In this multi-variability setting, where both 
the input model and the specification of a transformation contain variability, 
the existing approaches fall short to deal with the resulting complexity: One 
can either consider all rules, so they can be “lifted” to the product line, or 
consider all products, so they become amenable to VB model transformation. 
Both approaches are undesirable, as they require enumerating an exponentially 
growing number of artifacts and, therefore, threaten the feasibility of the trans- 
formation. 

In this paper, we introduce a methodology for SPL transformations inspired 
by the uniformity principle [14], a tenet that suggests to handle variability con- 
sistently throughout all software artifacts. We propose to capture variability of 
SPLs and transformations using variability-annotated domain models and rules. 
Model and rule elements are annotated with presence conditions, specifying the 
conditions under which the annotated elements are present. The presence condi- 
tions of model and rule elements are specified over two separate sets of features, 
representing SPL and rule variability. Annotated domain models and rules can 
be created manually using available editor support [15,16], or automatically from 
existing products and rules by using merge-refactoring techniques [17,18]. 

Given an SPL and a VB rule, as shown in Fig. 1, we provide a staged rule 
application technique (black arrow) for applying a VB rule to a SPL. In contrast 
to the state of the art (shown in gray), enumerating products or rules upfront is 
not required. By adopting this technique, existing tools that use transformation 
technology, such as refactoring engines, may benefit from improved performance. 


Specifically, we make the following contributions: 


— We introduce a staged technique for applying a VB rule to an SPL. Our 
technique combines core principles of VB rule applications and lifting, while 
avoiding their drawbacks w.r.t. enumerating all products or rules upfront. 

— We formally prove correctness of this technique by showing its equivalence to 
the application of each “flattened” product to each “flattened” rule. 

— We present an algorithm for implementing the rule application technique. 
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Fig. 1. Overview 


— We evaluate the usefulness of our technique by studying its performance in a 
substantial number of cases within a software engineering scenario. 


Our work builds on the underlying framework of algebraic graph transfor- 
mation (AGT) [19]. AGT is one of the standard model transformation language 
paradigms [20]; in addition, it has recently gained momentum as an analysis 
paradigm for other widespread paradigms and languages such as ATL [21]. We 
focus on the annotative paradigm to variability. Suitable converters to and from 
alternative paradigms, such as the composition-based one [22], may allow our 
technique to be used in other cases as well. 

The rest of this paper is structured as follows: We motivate and explain our 
contribution using a running example in Sect. 2. Section 3 revisits the necessary 
background. Section 4 introduces the formalization of our new rule application 
technique. The algorithm and its evaluation are presented in Sects.5 and 6, 
respectively. In Sect. 7 we discuss related work, before we conclude in Sect. 8. 


2 Running Example 


In this section, we introduce SPLs and variability-based model transformation by 
example, and motivate and explain our contribution in the light of this example. 


Software Product Lines. An SPL represents a collection of models that are 
similar, but different to each other. Figure 2 shows a washing machine controller 
SPL in an annotative representation, comprising an annotated domain model 
and a feature model. The feature model [23] specifies a root feature Wash with 
three optional children Heat, Delay, and Dry, where Heat and Delay are mutually 
exclusive. The domain model is a statechart diagram specifying the behavior 
of the controller SPL based on states Locking, Waiting, Washing, Drying, and 
UnLocking with transitions between them. Presence conditions, shown in gray 
labels, denote the condition under which an annotated element is present. These 
conditions are used to specify variations in the execution behavior. 

Concrete products can be obtained from configurations, in which each 
optional feature is set to either true or false. A product arises by removing 
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Fig. 2. Washing machine controller product line and product (adapted from [6]). 


those elements whose presence condition evaluates to false in the given con- 
figuration. For instance, selecting Delay and deselecting Heat and Dry yields 
the product shown in the right of Fig.2. The SPL has six configurations and 
products in total, since Wash is non-optional and Delay excludes Heat. 


Variability-Based (VB) Model Transformation. In complex model trans- 
formation scenarios, developers often create rules that are similar, but different 
to each other. As an example, consider two rules foldEntryActions and foldExi- 
tActions (Fig. 3), called A and B in short. These rules express a “fold” refactoring 
for statechart diagrams: if a state has two incoming or outgoing transitions with 
the same action, these actions are to be replaced by an entry or exit action of 
the state. The rules have a left- and a right-hand side (LHS, RHS). The LHS 
specifies a pattern to be matched to an input graph, and the difference between 
the LHS and the RHS specifies a change to be performed for each match, like 
the removing of transition actions, and the adding of exit and entry actions. 

Rules A and B are simple; however, in a realistic transformation system, the 
number of required rules can grow exponentially with the number of variation 
points in the rules. To avoid combinatorial explosion, a set of variability-intensive 
rules can be encoded into a single representation using a VB rule [12,18]. A VB 
rule consist of a LHS, a RHS, a feature model specifying a set of interrelated 
features, and presence conditions annotating LHS and RHS elements with a 
condition under which they are present. Individual “flat” rules are obtained via 
configuration, i.e., binding each feature to either true or false. In the VB rule 
A+ B, the feature model specifies a root feature refactor with alternative child 
features foldEntry and foldExit. Since exactly one child feature has to be active 
at one time, two possible configurations exist. The two rules arising from these 
configurations are isomorphic to rules A and B. 


Problem Statement. Model transformations such as foldActions are usually 
designed for applications to a concrete software product, represented by a single 
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Fig. 3. Two rules and their encoding into a variability-based rule (adapted from [24]). 


model. However, in various situations, it is desirable to extend the usage context 
to a set of models collected in an SPL. For example, during the batch refactoring 
of an SPL, all products should be refactored in a uniform way. 

Variability is challenging for model transformation technologies. As illus- 
trated in Table 1, products and rules need to be considered in manifold combi- 
nations. In our example, without dedicated variability support, the user needs to 
specify 6 products and 2 rules individually and trigger a rule application for each 
of the 12 combinations. A better strategy is enabled by VB model transforma- 
tion: by applying the VB rule A+ B, only 6 combinations need to be considered. 
Another strategy is to apply rules A and B to the SPL by lifting [6] them, lead- 
ing to 2 combinations and the biggest improvement so far. Still, in more complex 
cases, all of these strategies are insufficient. Since none of them avoids an expo- 
nential growth along the number of optional SPL features (#F'p) or optional 
rule features (#F,), the feasibility of the transformation is threatened. 


Table 1. Approaches for dealing with multi-variability. 


Approach Independent combinations 

Example General case 
Naive D ger pie 
VB transformation [12] 6 Q#Fp 
Lifting [6] 2 Q#Fr 


Staged application (new) 1 1 
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Fig. 4. Staged rule application of a VB rule to a product line. 


Solution Overview. To address this situation, we propose a staged rule appli- 
cation technique for applying a VB rule to an SPL. As shown in Fig. 4, this 
technique proceeds in three steps: In step 1, we consider the base rule, that is, 
the common portion of rules encoded in the VB rule, and match its LHS to the 
full domain model, temporarily ignoring its presence conditions. For example, 
considering rule A+B, the LHS of the base rule contains precisely states x1, 
x2, and x. A match to the domain model is indicated by dashed arrows. Using 
the presence conditions, we determine if the match can be mapped to any spe- 
cific product. In step 2, we extend the identified base matches to identify full 
matches of the rules encoded in the VB rule. In the example, we would derive 
rules A and B; in general, to avoid fully flattening all involved rules, one can 
incrementally consider common subrules. An example match is denoted in terms 
of dashed lines for the mappings of transitions and actions. In step 3, to perform 
rule applications based on identified matches, we use lifting to apply the rule 
for which the match was found. Lifting transforms the domain model and its 
presence condition in such way as if each product was considered individually. 
In the example, only products for the configuration { Delay = true; Heat = false} 
are amenable to the foldAction refactoring. Consequently, the new entry action 
startWash has the presence condition Delay, and other presence conditions are 
adjusted accordingly. Failure to find suitable matches and to fulfill a certain 
condition during lifting (discussed later) allows early termination of the process. 

Performance-wise, the main benefit of this technique is twofold: First, using 
the termination criteria, we can exit the matching process early without con- 
sidering specifics of products and rule variants. This is particularly beneficial in 
situations where none or only few rules of a larger rule set are applicable most 
of the time, which is typically the case, for example, in translators. Second, even 
if we have to enumerate some rules in step 2, we do not have to start the match- 
ing process from scratch, since we can save redundant effort by extending the 
available base matches. Consequently, Table 1 gives the number of independent 
combinations (in the sense that rule applications are started from scratch) as 1. 
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3 Background 


We now introduce the necessary prerequisites of our methodology, starting with 
the double-pushout approach to algebraic graph transformation [19]. As the 
underlying structure, we assume the category of graphs with graph morphisms 
(referred to as morphisms from here), although all considerations are likely com- 
patible with additional graph features such as typing and attributes. 


Definition 1 (Rules and applications). A ruler = (L = I ™ R) consists 
of graphs L, I and R, called left-hand side, interface graph and right-hand side, 
respectively, and two injective morphisms le and ri. 


Given a rule r, a graph G, and a mor- p< "© I T aR 
phism m : L — G, a rule application 
from G to a graph H, written G >pm H, 
m (1) d (2) m! 


arises from the diagram to the right, where 
(1) and (2) are pushouts. G, m and H are 
called start graph, match, and result graph, G 4 D 
respectively. 


A rule application exists iff the match m fulfills the gluing condition, which, 
in the category of graphs boils down to the dangling condition: all adjacent edges 
of a deleted node in m’s image m[L] must have a preimage in L. 


Product Lines. Our formalization represents product lines on the semantic 
level by considering interrelations between the included graphs. The domain 
model is a “maximal” graph of which all products are sub-graphs. The presence- 
condition function maps sub-graphs (rather than elements, as done on the syntac- 
tic level) to terms in the boolean term algebra over features, written Tgoor(Fp). 
The set of all sub-graphs of the domain model is written P(Mp). 


Definition 2 (Product line, configuration, product) 


- A product line P = (Fp, ®p, Mp, fp) consists of three parts: a feature model 
that consists of a set Fp of features, and a set of feature constraints Pp C 
TpgooL(Fp), a domain model Mp given as a graph, and a set of presence 
conditions expressed as a function fp: P(Mp) — Tgoo.(FP). 

— Given a set of features F, a configuration is a total function c : F —> 
{true, false}. A configuration c satisfies a term t € TgooL(F) if t evalu- 
ates to true when each variable v in t is substituted by c(v). A configuration 
c is valid w.r.t. a set of constraints ® if c satisfies every constraint in ©. 

- Given a product line P = (Fp,®p, Mp, fp), a product P, is derived from P 
under the valid configuration c if Pe is the union of all those graphs M' C Mp 
for which fp(M’) is satisfied by c: Pe = U{M’ C Mplc satisfies fp(M’) 
and c is valid w.r.t. Bp}. The flattening of P is the set Flat (P) of all products 
of P: Flat(P) = {P.|P. is a product of P}. 


Definition 3 (Lifted rule application). Given a product line P, a rule r, 
and a match m : L — Mp, a lifted rule application P sla Q is a construction 
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that relates P to a product line Q s.t. Fp = Fo, Pp = PQ, and the set of products 
Flat(Q) is the same as if r was applied to each product P; € Flat(P) for which 
an inclusion j : m[L] — P; from the image of m exists. 


Salay et al. [6] provide an algorithm for which it is shown that the proper- 
ties required in Definition 3 apply. The algorithm extends a rule application to 
the domain model by a check that the match can be mapped to at least one 
product, and by dedicated presence condition handling during additions and 
deletions. A more declarative treatment is offered by Taentzer et al. [25]’s prod- 
uct line pushout construction, which is designed to support lifted rule application 
as a special case. 


Variability-Based Transformation. VB rules are defined similarly to product 
lines, with a “maximal” rule instead of a domain model, and a notion of subrules 
instead of subgraphs. A subrule is a rule that can be embedded into a larger rule 
injectively s.t. the actions of rule elements are preserved [12], e.g., deletions are 
mapped to deletions. The set of all subrules of a rule r is written P(r). 


Definition 4 (Variability-based (VB) rule). A VB rule č = (Fy, ®;, rz, fx) 
consists of three parts: a feature model that consists of a set Fz of features, and a 
set of feature constraints ®; C Tgoo1(F;), a maximal rule r; being a rule, and 
a set of presence conditions expressed as a function fp: P(r) > Tpooi(F*). 


To later consider the base rule, that is, a maximal subrule of multiple 
flat rules, we define the flattening of VB rules in terms of consecutive inter- 
section and union constructions, expressed as multi-pullbacks and -pushouts 
[12]. The multi-pullback ro gives the base rule, over which the flat rule arises by 
multi-pushout. 


Definition 5 (Flat rule). Given a VB 
rule 7, for a valid configuration c w.r.t. L le r re op 
Pz, there exists a unique set of n subrules 9 9 


0 
Se C P(r) s.t. Vs E P(rz): s © Se iffc o. ree Pon 


) z(s). Merging th les vi 

satisfies fr(s). Merging these subrules via ci Lhe Sor K y 
multi-pullback and multi-pushout over ry L re) T À R 
and ro, respectively, yields a rule re, called F \ a 3 

le ri 
Ly Iy R 


flat rule induced by c. The flattening of 
ř is the set Flat(ř) of all flat rules of ř: 
Flat (7) = {re|re is a flat rule of 7}. 

In the example, rx is the rule A+ B, ignoring presence conditions. Given the 
configuration c = {foldEntry = true, foldEzit = false}, the multi-pullback over 
each subrule whose presence condition satisfies c yields as the base rule ro pre- 
cisely the part of rule A+ B without presence conditions (i.e., only the states). 
The resulting flat rule re is isomorphic to rule A. 

As a prerequisite for achieving efficiency during staged application, we revisit 
VB rule application. The key idea is that matches of a flat rule are composed 
from matches of all of its subrules. By considering the subrules during matching, 
we can reuse matches over several rules and identify early-exit opportunities. 


e 
re 
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Definition 6 (VB match family, VB match, VB rule application) 


— Given a variability-based rule 7, a graph G, and a valid configuration c, 
there exists a unique set of subrules Se C rz s.t. Vs E P(r): s € Se 
iff c satisfies f(s). A variability-based match family is a family of mor- 
phisms (ms : Ls > G)i<s<js,| $-t. Ymi, mj with 1 < i,j < |S_| the following 
compatability condition holds: Vz € dom(m;) N dom(m,) : mi(x) = mj(z). 


— Given a variability-based match family (ms) for *, G, I; miL 
and c, a variability-based match m is a pair (Me, c) mi 
where the morphism Me : Le — G is obtained by the p< | 
colimit property of Le. If me is a match, mm is called a G ko De 


variability-based match. ° 
- Given a variability-based match th = (me, c) for č and G, the application of 


ř at m is the rule application G =>,,.m, H of the flat rule re to me. 


In the example, a VB match family is obtained: Step 1 collects matches of 
the LHS Lo. Step 2 reuses these matches to match the flat rules: according to 
the compatibility condition, we may extend the matches rather than start from 
scratch. The set of VB rule applications for a rule ř to a model G is equivalent 
to the set of rule applications of all flat rules in Flat(7) to G [12, Theorem 2]. 


4 Miulti-variability of Product Line Transformations 


A variability-based rule represents a set of similar transformation rules, while a 
product line represents a set of similar models. We consider the application of a 
variability-based rule to a product line from a formal perspective. Our idea is to 
combine two principles of mazimality, which, up to now, were considered in isola- 
tion: First, by applying a rule to a “maximum” of all products, the rule can be lifted 
efficiently to a product line (Definition 3). Second, by reusing matches of a maximal 
subrule, several rules can be applied efficiently to a single model (Definition 6). 
We study three strategies for applying a variability-based rule 7 to a product 
line P; the third one leads to the notion of staged rule application as introduced 
in Sect. 2. First, we consider the naive case of flattening * and P and applying 
each rule to each product. Second, we take the two maximality principles into 
account to avoid the flattening of 7. Third, we use additional aspects from the 
first principle to avoid the flattening of P as well. We show that all strategies 
are equivalent in the sense that they change all of P’s products in the same way. 


4.1 Fully Flattened Application 


Definition 7 (Fully flattened application). Given the flattening of a prod- 
uct line P and the flattening of a rule family č, the set of fully-flattened rule 
applications Transrr(P,*) arises from applying each rule to each product: 


Transpr(P,7%) = {Pi >r.,me Qi|Pi E€ Flat(P), re E€ Flat(*), match me : Le > Pi} 
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In the example, there are two rules and six products; however, only for two 
products—the ones arising from configurations with Delay = true and Heat 
= false—a match, and, therefore, a rule application exists, as we saw in the 
earlier description of the example. Transrr(P,7) comprises the resulting two 
rule applications. 


4.2 Partially Flattened Application 


We now consider a strategy that aims to avoid unflattening the variability-based 
rule 7. We use the fact that the rules in ř generally share a maximal, possibly 
empty sub-rule rp that can be embedded into all rules in ř. Moreover, we exploit 
the fact that each product has an inclusion into the domain model. 

The key idea is as follows: each match of a flat rule to a product includes a 
match of rp into the domain model Mp. Absence of such a match implies that 
none of the rules in ř has a match, allowing us to stop without considering any 
flat rule in its entirety. Such exit point is particularly beneficial if the VB rule 
represents a subset of a larger rule set in which only a few rules can be matched 
at one time. Conversely, if a match for ro exists, a rule application arises if the 
match can be “rerouted” onto one of the products P;. In this case, we consider 
the flat rules, saving redundant matching effort by reusing the matches of ro. 


Io > Ro 
4a, N La, `A 
n 1 SI gj In ak i Rn 
lere N x rir DN x 
Te Re 
Kesh Cesk 
Kn > Qn 


Fig. 5. Partially flattened rule application. 


To reuse matches to the domain model for the products, we introduce the 
rerouting of a morphism from its codomain onto another graph G’. We omit 
naming the codomain and G” explicitly where they are clear from the context. 


Definition 8 (Rerouted morphism). Let an inclu- 
sioni: Go = G, a morphism m : L —> G with reroute(m,G) 


an epi-mono-factorization (e,m’), and a morphism j: G! = >G<— `L 


m[L] — G" be given, s.t. m’ = ioj. The rerouted mor- i 
phism reroute(m,G’) : L > G” arises by composition: j m fe 


reroute(m, G’) = joe. m[L] 
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Definition 9 (Rerouted variability-based match). Given a graph G, a 
variability-based rule č with a variability-based match m = (me, c) (Definition 6), 
and an inclusion i : G! — G. If the epi-mono-factorization of m. and a suitable 
morphism j exists, a rerouted morphism onto G" arises (Definition 8). Pairing 
this morphism with the configuration c induces the rerouted variability-based 
match of ńe onto G': reroute(m, G’) = (reroute(me, G"), c). 


In Fig. 5, Men is the morphism obtained by rerouting a match m, from the 
domain model Mp to product Pp. For example, if mc is the match indicated 
in steps 1 and 2 of Fig. 4, the morphism j and, consequently, m<,, exists only 
for products in which all images of the mappings exist as well, e.g., the product 
shown in the right of Fig. 2. Note that me, is a variability-based match to Mp: 
In an earlier explanation, we saw that the family (m; +) forms a variability-based 
match family. Therefore, per Definition 9, pairing Mme, with the configuration c 
induces a variability-based match to Pp, which can be used as follows. 

Variability-based rule application (Definition 6) allows us to save matching 
effort by considering shared parts of rules to a graph only once. The following 
definition allows us to lift this insight from graphs onto product lines. We show 
that the sets of partially and fully flattened rule applications are equivalent. 


Definition 10 (Partially flattened application). Given a variability-based 
rule č and a product line P, the set of partially flattened rule applications 
Transpp(P,ř) is obtained by rerouting all variability-based matches from the 
domain model Mp to products in P and collecting all resulting rule applications: 


Transpr(P,%) ={Pi Sim Qi | M= (mMe,c) is a VB match of č to Mp, 
P; € Flat(P),m' = (reroute(me, P;),c) is a VB match} 


Theorem 1 (Equivalence of fully and partially flattened rule applica- 
tions). Given a product line P and a variability-based rule č, Transpr(P,%) = 
Transpp(P,ř). 


Proof idea.‘ For every fully flattened (FF) rule application, we can find a corre- 
sponding partially flattened (PF) one, and vice versa: Given a FF rule applica- 
tion at a match m’, we compose m’ with the product inclusion into the domain 
model Mp to obtain a match me into Mp. Per Theorem 2 in [12], me induces 
a VB match and rule application. From a diagram chase, we see that m’ is the 
morphism arising from rerouting m, onto the product P;. Consequently, the rule 
application is PF. Conversely, a PF variability-based rule application induces a 
corresponding FF rule application by its definition. 


4.3 Staged Application 


The final strategy we consider, staged application, aims to avoid unflattening the 
products as well. This can be achieved by employing lifting (Definition 3): Lifting 


1 A full proof is provided in the extended version of this paper: http: //danielstrueber. 
de/publications/SPJ18.pdf. 
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takes a single rule and applies it to a domain model and its presence conditions 
in such a way as if the rule had been applied to each product individually. The 
considered rule in our case is a flat rule with a match to the domain model. 
Note that we cannot compare the set of staged applications directly to the 
set of flattened applications, since it does not live on the product level. We can, 
however, compare the obtained sets of products from both sets of applications, 
which happens to be the same, thus showing the correctness of our approach. 


Definition 11 (Staged application). Given a variability-based rule * and a 
product line P, the set of staged applications Transg;(P,7) is the set of lifted 
rule applications obtained from VB matches to the domain model Mp: 
Transsı(P, č) ={P = m. Q | M= (me,c) is a VB match of * to Mp} 
Corollary 1 (Equivalence of staged and partially flattened rule appli- 
cations). Given a product line P and a variability-based rule 7, the sets of 
products obtained from Transgi(P,7) and Transppr(P,7%) are isomorphic. 


Proof. Since both sets are defined over the same set of matches of flat rules, the 
proof follows straight from the definition of lifting. 


5 Algorithm 


We present an algorithm for 
implementing the staged applica- 
tion of a VB rule ř to a product 
line P. Following the overview 


Algorithm 1. Staged application. 


Input : Product line P, VB rule 7 
Output: Transformed product line P 


in Sect.2 and the treatment in 1 BMatches := findMatches(Modelp, ro); 
Sect.4, the main idea is to pro-  ? n € BMatches do , 
ceed in three steps: First, we 3 ar Ad PEE re J; 

A 4 if Pp A pc is SAT then 
match the base rule of ř to , foreach c € configs(*) do 
the domain model, ignoring pres- 6 flatRule := rz.removeAll(e | 
ence conditions. Second, we con- c É pce ); 
sider individual rules as far as 7 Matches := findMatches( 
necessary to obtain matches to Modelp, flatRule, m); 
the domain model. Third, based 8 lift(P, flatRule, Matches); 
on the matches, we perform the 9 end 
actual rule application by using 10 end 


the lifting algorithm from [6] ina 11 end 
black-box manner. 

Algorithm 1 shows the computation in more detail. In line 1, ř’s base rule ro 
is matched to the domain model Modelp, leading to a set of base matches. If 
this set is empty, we have reached the first exit criterion and can stop directly. 
Otherwise, given a match m, in line 2, we check if at least one product P; exists 
that m can be rerouted onto (Definition 8). To this end, in lines 3-4, we use a SAT 
solver to check if there is a valid configuration of P’s feature model for which all 
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Table 2. Subject rule set. Table 3. Subject product lines. 
Category #Rules #VBRules SPL #Elements #Products 
Create/Set 274 171 1: InCar 116 54 
Delete/Unset 164 121 2: E2E 130 94 
Change/Move 966 212 3: JSSE 24,077 64 
Total 1404 504 4: Notepad 252 512 

5: Mobile 4,069 3,072 
6: Lampiro 29,045 5,892 


presence conditions of matched elements evaluate to true. In this case, we iterate 
over the valid configurations of 7 in line 5 (we may proceed more fine-grainedly by 
using partial configurations; this optimization is omitted for simplicity). In line 
6, a flat rule is obtained by removing all elements from the rule whose presence 
condition evaluates to false. We match this rule to the domain model in line 7; to 
save redundant effort, we restrict the search to matches that extend the current 
base match. Absence of such a match is the second stopping criterion. Otherwise, 
we feed the flat rule and the set of matches to lifting in line 8. Handling dangling 
conditions is left to lifting; in the positive case, P is transformed afterwards. 

For illustration, consider the base match mı = { Looking, Waiting, Washing} 
from Fig.4. First we calculate pe. As none of the states in the domain model 
has a presence condition, ®,, is set to true and is identified as satisfiable. Two 
valid configurations exist, cı = {foldEntry = true, foldExit = false} and c2 = 
{foldEntry = false, foldExit = true}. Considering c1, the presence condition 
foldExit evaluates to false; removing the corresponding elements yield a rule 
isomorphic to Rule A in Fig. 3. Match m is now extended using this rule, leading 
to a match as shown in step 2 of Fig. 4. and then lifted, as discussed in the earlier 
explanation of the example. Step 2 is repeated for configuration c2; yet, as no 
suitable match in c2 exists, the shown transformation is the only possible one. 

This algorithm benefits from the correctness results shown in Sect. 4. Specifi- 
cally, it computes staged rule applications as per Definition 11: A configuration c 
is determined in line 5, and values for match me are collected in the set Matches. 
Via Corollary 1 and Theorem 1, the effect of the rule application to the products 
is the same as if each product had been considered individually. 

In terms of performance, two limiting factors are the use of a graph matcher 
and a SAT solver; both of them perform an NP-complete task. Still, we expect 
practical improvements from our strategy of reusing shared portions of the 
involved rules and graphs, and from the availability of efficient SAT solvers that 
scale up to millions of variables [26]. This hypothesis is studied in Sect. 6. 
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6 Evaluation 


To evaluate our technique, we implemented it for Henshin [27,28], a graph-based 
model transformation language, and applied it to a transformation scenario with 
product lines and transformation variability. The goal of our evaluation was to 
study if our technique indeed produces the expected performance benefits. 


Setup. The transformation is concerned with the detection of applied editing 
operations during model differencing [29]. This setting is particularly interesting 
for a performance evaluation: Since differencing is a routine software develop- 
ment task, low latency of the used tools is a prerequisite for developer effective- 
ness. The rule set, called UmlRecog, is tailored to the detection of UML edit 
operations. Each rule detects a specific edit operation, such as “move method to 
superclass”, based on a pair of model versions and a low-level difference trace. 
Um1Recog comprises 1404 rules, which, as shown in Table 2, fall in three main cat- 
egories: Create/Set, Change/Move, and Delete/Unset. To study the effect of our 
technique on performance, an encoding of the rules into VB rules was required. 
We obtained this encoding using RuleMerger [18], a tool for generating VB rules 
from classic ones based on clustering and clone detection [30]. We obtained 504 
VB rules; each of them representing between 1 and 71 classic rules. Um1Recog is 
publicly available as part of a benchmark transformation set [31]. 

We applied this transformation to the 6 UML-based product lines specified 
in Table 3. The product lines came from diverse sources and include manually 
designed ones (1-2), and reverse-engineered ones from open-source projects (3— 
6). Each product line was available as an UML model annotated with presence 
conditions over a feature model. To produce the model version pairs used by 
UmlRecog, we automatically simulated development steps by nondeterministi- 
cally applying rules from a set of edit rules to the product lines, using the lifting 
algorithm to account for presence conditions during the simulated editing step. 


Table 4. Execution times (in seconds) of the lifting and the staged approach. 


Create/Set Delete/Unset Change/Move Total 


Lift Stage Factor Lift Stage Factor Lift Stage Factor Lift Stage Factor 
InCar 2.13 0.52 4.1 0.23 0.12 1.9 7.28 0.86 8.5 9.66 1.49 6.5 
E2E 1.99 0.82 2.4 0.35 0.32 1.1 7.28 0.95 7.7 9.62 2.12 4.5 
JISSE 2.00 0.51 3.9 0.24 0.16 1.5 8.40 3.08 2.7 10.61 3.79 2.8 
Notepad 2.05 0.66 3.1 0.26 0.14 1.9 7.01 1.64 4.3 9.38 2.47 3.8 
Mobile 2.00 0.55 3.7 0.24 0.13 1.9 8.28 1.62 5.1 10.55 2.26 4.7 
Lampiro 2.05 0.64 3.2 0.26 0.15 1.7 8.25 2.58 3.2 10.55 3.29 3.2 


As baseline for comparison, we considered the lifted application of each rule 
in UmlRecog. An alternative baseline of applying VB rules to the flattened set of 
products was not considered: The SPL variability in our setting is much greater 
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than the rule variability, which implies a high performance penalty when enu- 
merating products. Since we currently do not support advanced transformation 
features, e.g., negative application conditions and amalgamation, we used vari- 
ants of the flat and the VB rules without these concepts. We used a Ubuntu 17.04 
system (Oracle JDK 1.8, Intel Core i5-6200U, 8GB RAM) for all experiments. 


Results. Table 4 gives an overview of the results of our experiments. The total 
execution times for our technique were between 1.5 and 3.3s, compared to 9.4 
and 10.6s for lifting, yielding a speedup by factors between 2.8 and 6.5. For both 
techniques, all execution times are in the same order of magnitude across product 
lines. A possible explanation is that the amount of applicable rules was small: 
if the vast majority of rules can be discarded early in the matching process, the 
execution time is constant with the number of rules. 

The greatest speedups were observed for the Change/Move category, in which 
rule variability was the greatest as well, indicated by the ratio between rules 
and VB rules in Table 2. This observation is in line with our rationale of reusing 
shared matches between rules. Regarding the number of products, a trend regard- 
ing better scalability is not apparent, thus demonstrating that lifting is sufficient 
for controlling product-line variability. Still, based on the overall results, the 
hypothesis that our technique improves performance in situations with signifi- 
cant product-line and transformation variability can be confirmed. 


Threats to Validity. Regarding external validity, we only considered a limited 
set of scenarios, based on six product lines and one large-scale transformation. 
We aim to apply our technique to a broader class of cases in the future. The 
version pairs were obtained in a synthetic process, arguably one that produces 
pessimistic cases. Our treatment so far is also limited to a particular transfor- 
mation paradigm, AGT, and one variability paradigm, the annotative one. Still, 
AGT and annotative variability are the underlying paradigms of many state- 
of-the-art tools. Finally, we did not consider the advanced AGT concepts of 
negative application conditions and amalgamation in our evaluation; extending 
our technique accordingly is left as future work. 


7 Related Work 


During an SPL’s lifecycle, not only the domain model, but also the feature 
model evolves [32,33]. To support the combined transformation of domain and 
feature models, Taentzer et al. [25] propose a unifying formal framework which 
generalizes Salay et al.’s notion of lifting [6], yet in a different direction than us: 
focusing on combined changes, this approach is not geared for internal variability 
of rules; similar rules are considered separately. Both works could be combined 
using a rule concept with separate feature models for rule and SPL variability. 
Beyond transformations of SPLs, transformations have been used to imple- 
ment SPLs. Feature-oriented development [34] supports the implementation of 
features as additive changes to a base product. Delta-oriented programming [35] 
adds flexibility to this approach: changes are specified using deltas that sup- 
port deletions and modifications as well. Impact analysis in an evolving SPL can 
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be performed by transforming deltas using higher-order deltas that encapsulate 
certain evolution operators [5]. For increased flexibility regarding inter-product 
reuse, deltas can be combined with traits [36]. Sijtema [8] introduced the concept 
of variability rules to develop SPLs using ATL. Conversely, SPL techniques have 
been applied to certain problems in transformation development. Xiao et al. 
[37] propose to capture variability in the backwards propagation of bidirectional 
transformations by turning the left-hand-side model into a SPL. Hussein et al. 
[10] present a notion of rule templates for generating groups of similar rules 
based on a data provenance model. These works address only one dimension of 
variability, either of a SPL or a transformation system. 

In the domain of graph transformation reuse, rule refinement [9] and amalga- 
mation [38] focus on reuse at the rule level; graph variability is not in their scope. 
Rensink and Ghamarian propose a solution for rule and graph decomposition 
based a certain accommodation condition, under which the effect of the original 
rule application is preserved [39,40]. In our approach, by matching against the 
full domain model rather than decomposing it, we trade off compositionality for 
the benefit of imposing fewer restrictions on graphs and rules. 


8 Conclusion and Future Work 


We propose a methodology for software product line transformations in which 
not only the input product line, but also the transformation system contains 
variability. At the heart of our methodology a staged rule application technique 
exploits reuse potential with regard to shared portions of the involved products 
and rules. We showed the correctness of our technique and demonstrated its 
benefit by applying it to a practical software engineering task. 

In the future, we aim to explore further variability dimensions, e.g., meta- 
model variability as considered in [41], and to extend our work to advanced 
transformation features, such as application conditions. We aim to address addi- 
tional variability mechanisms and to perform a broader evaluation. 
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